Modified strains for the production of recombinant silk

ABSTRACT

Disclosed herein are modified strains for reducing degradation of recombinantly expressed products secreted from a host organism and methods of using the modified strains. In some embodiments, to attenuate a protease activity in Pichia pastoris, the genes encoding enzymes the degrade proteases are inactivated or mutated to reduce or eliminate activity. In preferred strains, the protease activity of proteases encoded by PAS_chr4_0584 (YPS1-1) and PAS_chr3_1157 (YPS1-2) (e.g., polypeptides comprising SEQ ID NO: 66 and 67) is attenuated.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted electronically in ASCII format and is hereby incorporated byreference in its entirety. Said ASCII copy, created on Oct. 25, 2017, isnamed 37324US_CRF_sequencelisting.txt and is 388,936 bytes in size.

FIELD OF THE INVENTION

The present disclosure relates to methods of strain optimization toproduce or enhance production of proteins or metabolites from cells. Thepresent disclosure also relates to compositions resulting from thosemethods. In particular, the disclosure relates to yeast cells selectedor genetically engineered to reduce degradation of recombinant proteinsexpressed by the yeast cells, and to methods of cultivating yeast cellsfor the production of useful compounds.

BACKGROUND OF THE INVENTION

The methylotrophic yeast Pichia pastoris is widely used in theproduction of recombinant proteins. P. pastoris grows to high celldensity, provides tightly controlled methanol-inducible trans geneexpression and efficiently secretes heterologous proteins in definedmedia.

However, during culture of a strain of P. pastoris, recombinantlyexpressed proteins may be degraded before they can be collected,resulting in a mixture of proteins that includes fragments ofrecombinantly expressed proteins and a decreased yield of full-lengthrecombinant proteins. What is needed, therefore, are tools andengineered strains to mitigate protein degradation in P. pastoris.

SUMMARY OF THE INVENTION

In some embodiments, provided herein is a Pichia pastoris microorganism,in which the activity of a YPS1-1 protease and a YPS1-2 protease hasbeen attenuated or eliminated, wherein said microorganism expresses arecombinant polypeptide.

In some embodiments, the YPS1-1 protease comprises a polypeptidesequence at least 95% identical to SEQ ID NO: 67. In some embodiments,the YPS1-1 protease comprises SEQ ID NO: 67. In some embodiments, theYPS1-1 protease is encoded by a YPS1-1 gene. In some embodiments, theYPS1-1 gene comprises a polynucleotide sequence at least 95% identicalto SEQ ID NO: 1. In some embodiments, the YPS1-1 gene comprises at least15, 20, 25, 30, 40, or 50 contiguous nucleotides of SEQ ID NO: 1. Insome embodiments, the YPS1-1 gene comprises SEQ ID NO: 1. In someembodiments, the YPS1-1 gene is at locus PAS_chr4_0584 of saidmicroorganism.

In some embodiments, the YPS1-2 protease comprises a polypeptidesequence at least 95% identical to SEQ ID NO: 68. In some embodiments,the YPS1-2 protease comprises SEQ ID NO: 68. In some embodiments, theYPS1-2 protease is encoded by a YPS1-2 gene. In some embodiments, theYPS1-2 gene comprises a polynucleotide sequence at least 95% identicalto SEQ ID NO: 2. In some embodiments, the YPS1-2 gene comprises at least15, 20, 25, 30, 40, or 50 contiguous nucleotides of SEQ ID NO: 2. Insome embodiments, the YPS1-2 gene comprises SEQ ID NO: 2. In someembodiments, the YPS1-2 gene is at locus PAS_chr3_1157 of saidmicroorganism.

In some embodiments, the YPS1-1 gene or said YPS1-2 gene, or both, hasbeen mutated or knocked out.

In some embodiments, the microorganism expresses a recombinant protein.In some embodiments, the recombinant protein comprises at least oneblock polypeptide sequence from a silk protein. In some embodiments, therecombinant protein comprises a silk-like polypeptide. In someembodiments, the silk-like polypeptide comprises one or more repeatsequences {GGY-[GPG-X₁]n₁-GPS-(A)n₂}n₃ (SEQ ID NO: 514), whereinX₁=SGGQQ (SEQ ID NO: 515) or GAGQQ (SEQ ID NO: 516) or GQGPY (SEQ ID NO:517) or AGQQ (SEQ ID NO: 518) or SQ; n1 is from 4 to 8; n2 is from 6 to20; and n3 is from 2 to 20. In some embodiments, the silk-likepolypeptide comprises a polypeptide sequence encoded by SEQ ID NO: 462.

In some embodiments, the activity of one or more additional proteases inthe microorganism has been attenuated or eliminated. In someembodiments, the one or more additional proteases comprises YPS1-5,MCK7, or YPS1-3.

In some embodiments, the YPS1-5 gene is at locus PAS_chr3_0688 of saidmicroorganism.

In some embodiments, the MCK7 protease is encoded by a MCK7 genecomprising a polynucleotide sequence at least 95% identical to SEQ IDNO: 7. In some embodiments, the MCK7 gene comprises at least 15, 20, 25,30, 40, or 50 contiguous nucleotides of SEQ ID NO: 7. In someembodiments, the MCK7 gene comprises SEQ ID NO: 7. In some embodiments,the MCK7 gene is at locus PAS_chr1-1_0379 of said microorganism.

In some embodiments, the YPS1-3 protease is encoded by a YPS1-3 genecomprising a polynucleotide sequence at least 95% identical to SEQ IDNO: 3. In some embodiments, the YPS1-3 gene comprises at least 15, 20,25, 30, 40, or 50 contiguous nucleotides of SEQ ID NO: 3. In someembodiments, the YPS1-3 gene comprises SEQ ID NO: 3. In someembodiments, the YPS1-3 gene is at locus PAS_chr3_0299 of saidmicroorganism.

In some embodiments, the one or more additional proteases comprise apolypeptide sequence at least 95% identical to a polypeptide sequenceselected from the group consisting of: SEQ ID NO: 68-130. In someembodiments, the one or more additional proteases comprise a polypeptidesequence selected from the group consisting of: SEQ ID NO: 68-130. Insome embodiments, the one or more additional proteases are encoded by apolynucleotide sequence at least 95% identical to a polynucleotidesequence selected from the group consisting of: SEQ ID NO: 3-66. In someembodiments, the one or more additional proteases are encoded by apolynucleotide sequence comprising at least 15, 20, 25, 30, 40, or 50contiguous nucleotides of a polynucleotide sequence selected from thegroup consisting of: SEQ ID NO: 3-66.

In some embodiments, the microorganism comprises a 3×, 4× or 5× proteaseknockout.

Also provided herein, according to some embodiments of the invention, isa Pichia pastoris engineered microorganism comprising YPS1-1 and YPS1-2activity reduced by a mutation or deletion of the YPS1-1 gene comprisingSEQ ID NO: 1 and the YPS1-2 gene comprising SEQ ID NO: 2, wherein saidmicroorganism further comprises a recombinantly expressed proteincomprising a polypeptide sequence encoded by SEQ ID NO: 462.

In some embodiments, also provided herein is cell culture comprising aprotease mitigated microorganism as described herein.

Also provided herein, according to some embodiments, is a cell culturecomprising a microorganism whose YPS1-1 and YPS1-2 activity has beenattenuated or eliminated as described herein, wherein the microorganismrecombinantly expresses a protein, wherein said recombinantly expressedprotein is less degraded than a cell culture comprising an otherwiseidentical Pichia pastoris microorganism whose YPS1-1 and YPS1-2 activityhas not been attenuated or eliminated.

In some embodiments, provided herein is a method of producing arecombinant protein with a reduced degradation, comprising: culturingwhose YPS1-1 and YPS1-2 activity has been attenuated or eliminated asdescribed herein in a culture medium under conditions suitable forexpression of the recombinantly expressed protein; and isolating therecombinant protein from the microorganism or the culture medium.

In some embodiments, the recombinant protein is secreted from saidmicroorganism, and wherein isolating said recombinant protein comprisescollecting a culture medium comprising said secreted recombinantprotein. In some embodiments, the recombinant protein has a decreasedlevel of degradation as compared to said recombinant protein produced byan otherwise identical microorganism wherein said YPS1-1 and said YPS1-2protease activity has not been attenuated or eliminated.

Also provided herein is a method of modifying Pichia pastoris to reducethe degradation of a recombinantly expressed protein, comprisingknocking out or mutating a gene encoding a YPS1-1 protein and a YPS1-2protein. In some embodiments, the method of modifying Pichia pastoris toreduce the degradation of a recombinantly expressed protein furthercomprises knocking out or mutating one or more additional genes encodinga YPS1-3 protein, a YPS1-5 protein, or an MCK7 protein. In someembodiments, the method of modifying Pichia pastoris to reduce thedegradation of a recombinantly expressed protein further comprisesknocking out one or more genes encoding a protein comprising apolypeptide selected from the group consisting of SEQ ID NO: 68-130.

In some embodiments, the recombinantly expressed protein comprises apolyA sequence comprising at least at least 2, 3, 4, 5, 6, 7, 8, 9, or10 contiguous alanine residues (SEQ ID NO: 519). In some embodiments,the recombinantly expressed protein comprises a silk-like polypeptide.In some embodiments, the silk-like polypeptide comprises one or morerepeat sequences {GGY-[GPG-X₁]n₁-GPS-(A)n₂}n₃ (SEQ ID NO: 514), whereinX₁=SGGQQ (SEQ ID NO: 515) or GAGQQ (SEQ ID NO: 516) or GQGPY (SEQ ID NO:517) or AGQQ (SEQ ID NO: 518) or SQ; n1 is from 4 to 8; n2 is from 6 to20; and n3 is from 2 to 20. In some embodiments, the recombinantlyexpressed protein comprises a polypeptide sequence encoded by SEQ ID NO:462.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, features and advantages will beapparent from the following description of particular embodiments of theinvention, as illustrated in the accompanying drawings in which likereference characters refer to the same parts throughout the differentviews. The drawings are not necessarily to scale, emphasis insteadplaced upon illustrating the principles of various embodiments of theinvention.

FIG. 1 is a plasmid map for KU 70 deletion with a zeocin resistancemarker.

FIG. 2 is a plasmid map of a plasmid comprising a nourseothricin markerused with homology arms for targeted protease gene deletion.

FIG. 3A and FIG. 3B are cassettes for protease knockout with homologyarms targeting the desired protease gene flanking a nourseothricinresistance marker.

FIG. 4 is a representative western blot of protein isolated from singleKO strains to show protein degradation from these strains.

FIG. 5 is a representative western blot of protein isolated from doubleKO strains to show protein degradation from these strains.

FIG. 6 is a representative western blot of protein isolated from 2×, 3×,4×, and 5× protease KO strains subcultured in BMGY or YPD to showprotein degradation in these strains.

DETAILED DESCRIPTION

The details of various embodiments of the invention are set forth in thedescription below. Other features, objects, and advantages of theinvention will be apparent from the description and the drawings, andfrom the claims.

Definitions

Unless otherwise defined herein, scientific and technical terms used inconnection with the present invention shall have the meanings that arecommonly understood by those of ordinary skill in the art. Further,unless otherwise required by context, singular terms shall include theplural and plural terms shall include the singular. The terms “a” and“an” includes plural references unless the context dictates otherwise.Generally, nomenclatures used in connection with, and techniques of,biochemistry, enzymology, molecular and cellular biology, microbiology,genetics and protein and nucleic acid chemistry and hybridizationdescribed herein are those well-known and commonly used in the art.

The following terms, unless otherwise indicated, shall be understood tohave the following meanings:

The term “polynucleotide” or “nucleic acid molecule” refers to apolymeric form of nucleotides of at least 10 bases in length. The termincludes DNA molecules (e.g., cDNA or genomic or synthetic DNA) and RNAmolecules (e.g., mRNA or synthetic RNA), as well as analogs of DNA orRNA containing non-natural nucleotide analogs, non-nativeinternucleoside bonds, or both. The nucleic acid can be in anytopological conformation. For instance, the nucleic acid can besingle-stranded, double-stranded, triple-stranded, quadruplexed,partially double-stranded, branched, hairpinned, circular, or in apadlocked conformation.

Unless otherwise indicated, and as an example for all sequencesdescribed herein under the general format “SEQ ID NO:”, “nucleic acidcomprising SEQ ID NO: 1” refers to a nucleic acid, at least a portion ofwhich has either (i) the sequence of SEQ ID NO:1, or (ii) a sequencecomplementary to SEQ ID NO: 1. The choice between the two is dictated bythe context. For instance, if the nucleic acid is used as a probe, thechoice between the two is dictated by the requirement that the probe becomplementary to the desired target.

An “isolated” RNA, DNA or a mixed polymer is one which is substantiallyseparated from other cellular components that naturally accompany thenative polynucleotide in its natural host cell, e.g., ribosomes,polymerases and genomic sequences with which it is naturally associated.

An “isolated” organic molecule (e.g., a silk protein) is one which issubstantially separated from the cellular components (membrane lipids,chromosomes, proteins) of the host cell from which it originated, orfrom the medium in which the host cell was cultured. The term does notrequire that the biomolecule has been separated from all otherchemicals, although certain isolated biomolecules may be purified tonear homogeneity.

The term “recombinant” refers to a biomolecule, e.g., a gene or protein,that (1) has been removed from its naturally occurring environment, (2)is not associated with all or a portion of a polynucleotide in which thegene is found in nature, (3) is operatively linked to a polynucleotidewhich it is not linked to in nature, or (4) does not occur in nature.The term “recombinant” can be used in reference to cloned DNA isolates,chemically synthesized polynucleotide analogs, or polynucleotide analogsthat are biologically synthesized by heterologous systems, as well asproteins and/or mRNAs encoded by such nucleic acids.

An endogenous nucleic acid sequence in the genome of an organism (or theencoded protein product of that sequence) is deemed “recombinant” hereinif a heterologous sequence is placed adjacent to the endogenous nucleicacid sequence, such that the expression of this endogenous nucleic acidsequence is altered. In this context, a heterologous sequence is asequence that is not naturally adjacent to the endogenous nucleic acidsequence, whether or not the heterologous sequence is itself endogenous(originating from the same host cell or progeny thereof) or exogenous(originating from a different host cell or progeny thereof). By way ofexample, a promoter sequence can be substituted (e.g., by homologousrecombination) for the native promoter of a gene in the genome of a hostcell, such that this gene has an altered expression pattern. This genewould now become “recombinant” because it is separated from at leastsome of the sequences that naturally flank it.

A nucleic acid is also considered “recombinant” if it contains anymodifications that do not naturally occur to the corresponding nucleicacid in a genome. For instance, an endogenous coding sequence isconsidered “recombinant” if it contains an insertion, deletion or apoint mutation introduced artificially, e.g., by human intervention. A“recombinant nucleic acid” also includes a nucleic acid integrated intoa host cell chromosome at a heterologous site and a nucleic acidconstruct present as an episome.

As used herein, the phrase “degenerate variant” of a reference nucleicacid sequence encompasses nucleic acid sequences that can be translated,according to the standard genetic code, to provide an amino acidsequence identical to that translated from the reference nucleic acidsequence. The term “degenerate oligonucleotide” or “degenerate primer”is used to signify an oligonucleotide capable of hybridizing with targetnucleic acid sequences that are not necessarily identical in sequencebut that are homologous to one another within one or more particularsegments.

The term “percent sequence identity” or “identical” in the context ofnucleic acid sequences refers to the residues in the two sequences whichare the same when aligned for maximum correspondence. The length ofsequence identity comparison may be over a stretch of at least aboutnine nucleotides, usually at least about 20 nucleotides, more usually atleast about 24 nucleotides, typically at least about 28 nucleotides,more typically at least about 32 nucleotides, and preferably at leastabout 36 or more nucleotides. There are a number of different algorithmsknown in the art which can be used to measure nucleotide sequenceidentity. For instance, polynucleotide sequences can be compared usingFASTA, Gap or Bestfit, which are programs in Wisconsin Package Version10.0, Genetics Computer Group (GCG), Madison, Wis. FASTA providesalignments and percent sequence identity of the regions of the bestoverlap between the query and search sequences. Pearson, MethodsEnzymol. 183:63-98 (1990) (hereby incorporated by reference in itsentirety). For instance, percent sequence identity between nucleic acidsequences can be determined using FASTA with its default parameters (aword size of 6 and the NOPAM factor for the scoring matrix) or using Gapwith its default parameters as provided in GCG Version 6.1, hereinincorporated by reference. Alternatively, sequences can be comparedusing the computer program, BLAST (Altschul et al., J. Mol. Biol.215:403-410 (1990); Gish and States, Nature Genet. 3:266-272 (1993);Madden et al., Meth. Enzymol. 266:131-141 (1996); Altschul et al.,Nucleic Acids Res. 25:3389-3402 (1997); Zhang and Madden, Genome Res.7:649-656 (1997)), especially blastp or tblastn (Altschul et al.,Nucleic Acids Res. 25:3389-3402 (1997)).

The term “substantial homology” or “substantial similarity,” whenreferring to a nucleic acid or fragment thereof, indicates that, whenoptimally aligned with appropriate nucleotide insertions or deletionswith another nucleic acid (or its complementary strand), there isnucleotide sequence identity in at least about 75%, 80%, 85%, preferablyat least about 90%, and more preferably at least about 95%, 96%, 97%,98% or 99% of the nucleotide bases, as measured by any well-knownalgorithm of sequence identity, such as FASTA, BLAST or Gap, asdiscussed above.

Alternatively, substantial homology or similarity exists when a nucleicacid or fragment thereof hybridizes to another nucleic acid, to a strandof another nucleic acid, or to the complementary strand thereof, understringent hybridization conditions. “Stringent hybridization conditions”and “stringent wash conditions” in the context of nucleic acidhybridization experiments depend upon a number of different physicalparameters. Nucleic acid hybridization will be affected by suchconditions as salt concentration, temperature, solvents, the basecomposition of the hybridizing species, length of the complementaryregions, and the number of nucleotide base mismatches between thehybridizing nucleic acids, as will be readily appreciated by thoseskilled in the art. One having ordinary skill in the art knows how tovary these parameters to achieve a particular stringency ofhybridization.

In general, “stringent hybridization” is performed at about 25° C. belowthe thermal melting point (T_(m)) for the specific DNA hybrid under aparticular set of conditions. “Stringent washing” is performed attemperatures about 5° C. lower than the T_(m) for the specific DNAhybrid under a particular set of conditions. The T_(m) is thetemperature at which 50% of the target sequence hybridizes to aperfectly matched probe. See Sambrook et al., Molecular Cloning: ALaboratory Manual, 2d ed., Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y. (1989), page 9.51, hereby incorporated by reference.For purposes herein, “stringent conditions” are defined for solutionphase hybridization as aqueous hybridization (i.e., free of formamide)in 6×SSC (where 20×SSC contains 3.0 M NaCl and 0.3 M sodium citrate), 1%SDS at 65° C. for 8-12 hours, followed by two washes in 0.2×SSC, 0.1%SDS at 65° C. for 20 minutes. It will be appreciated by the skilledworker that hybridization at 65° C. will occur at different ratesdepending on a number of factors including the length and percentidentity of the sequences which are hybridizing.

The nucleic acids (also referred to as polynucleotides) of this presentinvention may include both sense and antisense strands of RNA, cDNA,genomic DNA, and synthetic forms and mixed polymers of the above. Theymay be modified chemically or biochemically or may contain non-naturalor derivatized nucleotide bases, as will be readily appreciated by thoseof skill in the art. Such modifications include, for example, labels,methylation, substitution of one or more of the naturally occurringnucleotides with an analog, internucleotide modifications such asuncharged linkages (e.g., methyl phosphonates, phosphotriesters,phosphoramidates, carbamates, etc.), charged linkages (e.g.,phosphorothioates, phosphorodithioates, etc.), pendent moieties (e.g.,polypeptides), intercalators (e.g., acridine, psoralen, etc.),chelators, alkylators, and modified linkages (e.g., alpha anomericnucleic acids, etc.) Also included are synthetic molecules that mimicpolynucleotides in their ability to bind to a designated sequence viahydrogen bonding and other chemical interactions. Such molecules areknown in the art and include, for example, those in which peptidelinkages substitute for phosphate linkages in the backbone of themolecule. Other modifications can include, for example, analogs in whichthe ribose ring contains a bridging moiety or other structure such asthe modifications found in “locked” nucleic acids.

The term “mutated” when applied to nucleic acid sequences means thatnucleotides in a nucleic acid sequence may be inserted, deleted orchanged compared to a reference nucleic acid sequence. A singlealteration may be made at a locus (a point mutation) or multiplenucleotides may be inserted, deleted or changed at a single locus. Inaddition, one or more alterations may be made at any number of lociwithin a nucleic acid sequence. A nucleic acid sequence may be mutatedby any method known in the art including but not limited to mutagenesistechniques such as “error-prone PCR” (a process for performing PCR underconditions where the copying fidelity of the DNA polymerase is low, suchthat a high rate of point mutations is obtained along the entire lengthof the PCR product; see, e.g., Leung et al., Technique, 1:11-15 (1989)and Caldwell and Joyce, PCR Methods Applic. 2:28-33 (1992)); and“oligonucleotide-directed mutagenesis” (a process which enables thegeneration of site-specific mutations in any cloned DNA segment ofinterest; see, e.g., Reidhaar-Olson and Sauer, Science 241:53-57(1988)).

The term “attenuate” as used herein generally refers to a functionaldeletion, including a mutation, partial or complete deletion, insertion,or other variation made to a gene sequence or a sequence controlling thetranscription of a gene sequence, which reduces or inhibits productionof the gene product, or renders the gene product non-functional. In someinstances a functional deletion is described as a knockout mutation.Attenuation also includes amino acid sequence changes by altering thenucleic acid sequence, placing the gene under the control of a lessactive promoter, down-regulation, expressing interfering RNA, ribozymesor antisense sequences that target the gene of interest, or through anyother technique known in the art. In one example, the sensitivity of aparticular enzyme to feedback inhibition or inhibition caused by acomposition that is not a product or a reactant (non-pathway specificfeedback) is lessened such that the enzyme activity is not impacted bythe presence of a compound. In other instances, an enzyme that has beenaltered to be less active can be referred to as attenuated.

The term “deletion” as used herein refers to the removal of one or morenucleotides from a nucleic acid molecule or one or more amino acids froma protein, the regions on either side being joined together.

The term “knock-out” as used herein is intended to refer to a gene whoselevel of expression or activity has been reduced to zero. In someexamples, a gene is knocked-out via deletion of some or all of itscoding sequence. In other examples, a gene is knocked-out viaintroduction of one or more nucleotides into its open reading frame,which results in translation of a non-sense or otherwise non-functionalprotein product.

The term “vector” as used herein is intended to refer to a nucleic acidmolecule capable of transporting another nucleic acid to which it hasbeen linked. One type of vector is a “plasmid,” which generally refersto a circular double stranded DNA loop into which additional DNAsegments may be ligated, but also includes linear double-strandedmolecules such as those resulting from amplification by the polymerasechain reaction (PCR) or from treatment of a circular plasmid with arestriction enzyme. Other vectors include cosmids, bacterial artificialchromosomes (BAC) and yeast artificial chromosomes (YAC). Another typeof vector is a viral vector, wherein additional DNA segments may beligated into the viral genome (discussed in more detail below). Certainvectors are capable of autonomous replication in a host cell into whichthey are introduced (e.g., vectors having an origin of replication whichfunctions in the host cell). Other vectors can be integrated into thegenome of a host cell upon introduction into the host cell, and arethereby replicated along with the host genome. Moreover, certainpreferred vectors are capable of directing the expression of genes towhich they are operatively linked. Such vectors are referred to hereinas “recombinant expression vectors” (or simply “expression vectors”).

“Operatively linked” or “operably linked” expression control sequencesrefers to a linkage in which the expression control sequence iscontiguous with the gene of interest to control the gene of interest, aswell as expression control sequences that act in trans or at a distanceto control the gene of interest.

The term “expression control sequence” refers to polynucleotidesequences which are necessary to affect the expression of codingsequences to which they are operatively linked. Expression controlsequences are sequences which control the transcription,post-transcriptional events and translation of nucleic acid sequences.Expression control sequences include appropriate transcriptioninitiation, termination, promoter and enhancer sequences; efficient RNAprocessing signals such as splicing and polyadenylation signals;sequences that stabilize cytoplasmic mRNA; sequences that enhancetranslation efficiency (e.g., ribosome binding sites); sequences thatenhance protein stability; and when desired, sequences that enhanceprotein secretion. The nature of such control sequences differsdepending upon the host organism; in prokaryotes, such control sequencesgenerally include promoter, ribosomal binding site, and transcriptiontermination sequence. The term “control sequences” is intended toinclude, at a minimum, all components whose presence is essential forexpression, and can also include additional components whose presence isadvantageous, for example, leader sequences and fusion partnersequences.

The term “regulatory element” refers to any element which affectstranscription or translation of a nucleic acid molecule. These include,by way of example but not limitation: regulatory proteins (e.g.,transcription factors), chaperones, signaling proteins, RNAi molecules,antisense RNA molecules, microRNAs and RNA aptamers. Regulatory elementsmay be endogenous to the host organism. Regulatory elements may also beexogenous to the host organism. Regulatory elements may be syntheticallygenerated regulatory elements.

The term “promoter,” “promoter element,” or “promoter sequence” as usedherein, refers to a DNA sequence which when ligated to a nucleotidesequence of interest is capable of controlling the transcription of thenucleotide sequence of interest into mRNA. A promoter is typically,though not necessarily, located 5′ (i.e., upstream) of a nucleotidesequence of interest whose transcription into mRNA it controls, andprovides a site for specific binding by RNA polymerase and othertranscription factors for initiation of transcription. Promoters may beendogenous to the host organism. Promoters may also be exogenous to thehost organism. Promoters may be synthetically generated regulatoryelements.

Promoters useful for expressing the recombinant genes described hereininclude both constitutive and inducible/repressible promoters. Wheremultiple recombinant genes are expressed in an engineered organism ofthe invention, the different genes can be controlled by differentpromoters or by identical promoters in separate operons, or theexpression of two or more genes may be controlled by a single promoteras part of an operon.

The term “recombinant host cell” (or simply “host cell”), as usedherein, is intended to refer to a cell into which a recombinant vectorhas been introduced. It should be understood that such terms areintended to refer not only to the particular subject cell but to theprogeny of such a cell. Because certain modifications may occur insucceeding generations due to either mutation or environmentalinfluences, such progeny may not, in fact, be identical to the parentcell, but are still included within the scope of the term “host cell” asused herein. A recombinant host cell may be an isolated cell or cellline grown in culture or may be a cell which resides in a living tissueor organism.

The term “peptide” as used herein refers to a short polypeptide, e.g.,one that is typically less than about 50 amino acids long and moretypically less than about 30 amino acids long. The term as used hereinencompasses analogs and mimetics that mimic structural and thusbiological function.

The term “polypeptide” encompasses both naturally-occurring andnon-naturally-occurring proteins, and fragments, mutants, derivativesand analogs thereof. A polypeptide may be monomeric or polymeric.Further, a polypeptide may comprise a number of different domains eachof which has one or more distinct activities.

The term “isolated protein” or “isolated polypeptide” is a protein orpolypeptide that by virtue of its origin or source of derivation (1) isnot associated with naturally associated components that accompany it inits native state, (2) exists in a purity not found in nature, wherepurity can be adjudged with respect to the presence of other cellularmaterial (e.g., is free of other proteins from the same species) (3) isexpressed by a cell from a different species, or (4) does not occur innature (e.g., it is a fragment of a polypeptide found in nature or itincludes amino acid analogs or derivatives not found in nature orlinkages other than standard peptide bonds). Thus, a polypeptide that ischemically synthesized or synthesized in a cellular system differentfrom the cell from which it naturally originates will be “isolated” fromits naturally associated components. A polypeptide or protein may alsobe rendered substantially free of naturally associated components byisolation, using protein purification techniques well known in the art.As thus defined, “isolated” does not necessarily require that theprotein, polypeptide, peptide or oligopeptide so described has beenphysically removed from its native environment.

The term “polypeptide fragment” refers to a polypeptide that has adeletion, e.g., an amino-terminal and/or carboxy-terminal deletioncompared to a full-length polypeptide. In a preferred embodiment, thepolypeptide fragment is a contiguous sequence in which the amino acidsequence of the fragment is identical to the corresponding positions inthe naturally-occurring sequence. Fragments typically are at least 5, 6,7, 8, 9 or 10 amino acids long, preferably at least 12, 14, 16 or 18amino acids long, more preferably at least 20 amino acids long, morepreferably at least 25, 30, 35, 40 or 45, amino acids, even morepreferably at least 50 or 60 amino acids long, and even more preferablyat least 70 amino acids long.

A protein has “homology” or is “homologous” to a second protein if thenucleic acid sequence that encodes the protein has a similar sequence tothe nucleic acid sequence that encodes the second protein.Alternatively, a protein has homology to a second protein if the twoproteins have “similar” amino acid sequences. (Thus, the term“homologous proteins” is defined to mean that the two proteins havesimilar amino acid sequences.) As used herein, homology between tworegions of amino acid sequence (especially with respect to predictedstructural similarities) is interpreted as implying similarity infunction.

When “homologous” is used in reference to proteins or peptides, it isrecognized that residue positions that are not identical often differ byconservative amino acid substitutions. A “conservative amino acidsubstitution” is one in which an amino acid residue is substituted byanother amino acid residue having a side chain (R group) with similarchemical properties (e.g., charge or hydrophobicity). In general, aconservative amino acid substitution will not substantially change thefunctional properties of a protein. In cases where two or more aminoacid sequences differ from each other by conservative substitutions, thepercent sequence identity or degree of homology may be adjusted upwardsto correct for the conservative nature of the substitution. Means formaking this adjustment are well known to those of skill in the art. See,e.g., Pearson, 1994, Methods Mol. Biol. 24:307-31 and 25:365-89 (hereinincorporated by reference).

The twenty conventional amino acids and their abbreviations followconventional usage. See Immunology-A Synthesis (Golub and Gren eds.,Sinauer Associates, Sunderland, Mass., 2^(nd) ed. 1991), which isincorporated herein by reference. Stereoisomers (e.g., D-amino acids) ofthe twenty conventional amino acids, unnatural amino acids such as α-,α-disubstituted amino acids, N-alkyl amino acids, and otherunconventional amino acids may also be suitable components forpolypeptides of the present invention. Examples of unconventional aminoacids include: 4-hydroxyproline, γ-carboxyglutamate,ε-N,N,N-trimethyllysine, ε-N-acetyllysine, O-phosphoserine,N-acetylserine, N-formylmethionine, 3-methylhistidine, 5-hydroxylysine,N-methylarginine, and other similar amino acids and imino acids (e.g.,4-hydroxyproline). In the polypeptide notation used herein, theleft-hand end corresponds to the amino terminal end and the right-handend corresponds to the carboxy-terminal end, in accordance with standardusage and convention.

The following six groups each contain amino acids that are conservativesubstitutions for one another: 1) Serine (S), Threonine (T); 2) AsparticAcid (D), Glutamic Acid (E); 3) Asparagine (N), Glutamine (Q); 4)Arginine (R), Lysine (K); 5) Isoleucine (I), Leucine (L), Methionine(M), Alanine (A), Valine (V), and 6) Phenylalanine (F), Tyrosine (Y),Tryptophan (W).

Sequence homology for polypeptides, which is sometimes also referred toas percent sequence identity, is typically measured using sequenceanalysis software. See, e.g., the Sequence Analysis Software Package ofthe Genetics Computer Group (GCG), University of Wisconsin BiotechnologyCenter, 910 University Avenue, Madison, Wis. 53705. Protein analysissoftware matches similar sequences using a measure of homology assignedto various substitutions, deletions and other modifications, includingconservative amino acid substitutions. For instance, GCG containsprograms such as “Gap” and “Bestfit” which can be used with defaultparameters to determine sequence homology or sequence identity betweenclosely related polypeptides, such as homologous polypeptides fromdifferent species of organisms or between a wild-type protein and amutein thereof. See, e.g., GCG Version 6.1.

A useful algorithm when comparing a particular polypeptide sequence to adatabase containing a large number of sequences from different organismsis the computer program BLAST (Altschul et al., J. Mol. Biol.215:403-410 (1990); Gish and States, Nature Genet. 3:266-272 (1993);Madden et al., Meth. Enzymol. 266:131-141 (1996); Altschul et al.,Nucleic Acids Res. 25:3389-3402 (1997); Zhang and Madden, Genome Res.7:649-656 (1997)), especially blastp or tblastn (Altschul et al.,Nucleic Acids Res. 25:3389-3402 (1997)).

Preferred parameters for BLASTp are: Expectation value: 10 (default);Filter: seg (default); Cost to open a gap: 11 (default); Cost to extenda gap: 1 (default); Max. alignments: 100 (default); Word size: 11(default); No. of descriptions: 100 (default); Penalty Matrix:BLOWSUM62.

Preferred parameters for BLASTp are: Expectation value: 10 (default);Filter: seg (default); Cost to open a gap: 11 (default); Cost to extenda gap: 1 (default); Max. alignments: 100 (default); Word size: 11(default); No. of descriptions: 100 (default); Penalty Matrix:BLOWSUM62. The length of polypeptide sequences compared for homologywill generally be at least about 16 amino acid residues, usually atleast about 20 residues, more usually at least about 24 residues,typically at least about 28 residues, and preferably more than about 35residues. When searching a database containing sequences from a largenumber of different organisms, it is preferable to compare amino acidsequences. Database searching using amino acid sequences can be measuredby algorithms other than blastp known in the art. For instance,polypeptide sequences can be compared using FASTA, a program in GCGVersion 6.1. FASTA provides alignments and percent sequence identity ofthe regions of the best overlap between the query and search sequences.Pearson, Methods Enzymol. 183:63-98 (1990) (incorporated by referenceherein). For example, percent sequence identity between amino acidsequences can be determined using FASTA with its default parameters (aword size of 2 and the PAM250 scoring matrix), as provided in GCGVersion 6.1, herein incorporated by reference.

Throughout this specification and claims, the word “comprise” orvariations such as “comprises” or “comprising,” will be understood toimply the inclusion of a stated integer or group of integers but not theexclusion of any other integer or group of integers.

Exemplary methods and materials are described below, although methodsand materials similar or equivalent to those described herein can alsobe used in the practice of the present invention and will be apparent tothose of skill in the art. All publications and other referencesmentioned herein are incorporated by reference in their entirety. Incase of conflict, the present specification, including definitions, willcontrol. The materials, methods, and examples are illustrative only andnot intended to be limiting.

Overview

Provided herein are recombinant strains and methods of producingrecombinant strains to increase production of a full-length desiredproduct in a target cell, e.g., by reducing protease degradation.

In some embodiments, to attenuate a protease activity in Pichiapastoris, the genes encoding these enzymes are inactivated or mutated toreduce or eliminate activity. This can be done through mutations orinsertions into the gene itself of through modification of a generegulatory element. This can be achieved through standard yeast geneticstechniques. Examples of such techniques include gene replacement throughdouble homologous recombination, in which homologous regions flankingthe gene to be inactivated are cloned in a vector flanking a selectablemaker gene (such as an antibiotic resistance gene or a genecomplementing an auxotrophy of the yeast strain).

Alternatively, the homologous regions can be PCR-amplified and linkedthrough overlapping PCR to the selectable marker gene. Subsequently,such DNA fragments are transformed into Pichia pastoris through methodsknown in the art, e.g., electroporation. Transformants that then growunder selective conditions are analyzed for the gene disruption eventthrough standard techniques, e.g. PCR on genomic DNA or Southern blot.In an alternative experiment, gene inactivation can be achieved throughsingle homologous recombination, in which case, e.g. the 5′ end of thegene's ORF is cloned on a promoterless vector also containing aselectable marker gene. Upon linearization of such vector throughdigestion with a restriction enzyme only cutting the vector in thetarget-gene homologous fragment, such vector is transformed into Pichiapastoris. Integration at the target gene site is confirmed through PCRon genomic DNA or Southern blot. In this way, a duplication of the genefragment cloned on the vector is achieved in the genome, resulting intwo copies of the target gene locus: a first copy in which the ORF isincomplete, thus resulting in the expression (if at all) of a shortened,inactive protein, and a second copy which has no promoter to drivetranscription.

Alternatively, transposon mutagenesis is used to inactivate the targetgene. A library of such mutants can be screened through PCR forinsertion events in the target gene.

The functional phenotype (i.e., deficiencies) of an engineered/knockoutstrain can be assessed using techniques known in the art. For example, adeficiency of an engineered strain in protease activity can beascertained using any of a variety of methods known in the art, such asan assay of hydrolytic activity of chromogenic protease substrates, bandshifts of substrate proteins for the selected protease, among others.

Attenuation of a protease activity described herein can be achievedthrough mechanisms other than a knockout mutation. For example, adesired protease can be attenuated via amino acid sequence changes byaltering the nucleic acid sequence, placing the gene under the controlof a less active promoter, down-regulation, expressing interfering RNA,ribozymes or antisense sequences that target the gene of interest, orthrough any other technique known in the art. In preferred strains, theprotease activity of proteases encoded at PAS_chr4_0584 (YPS1-1) andPAS_chr3_1157 (YPS1-2) (e.g., polypeptides comprising SEQ ID NO: 67 and68) is attenuated by any of the methods described above. In someaspects, the invention is directed to methylotrophic yeast strains,especially Pichia pastoris strains, wherein a YPS1-1 and a YPS1-2 gene(e.g., as set forth in SEQ ID NO: 1 and SEQ ID NO: 2) have beeninactivated. In some embodiments, additional protease encoding genes mayalso be knocked-out in accordance with the methods provided herein tofurther reduce protease activity of a desired protein product expressedby the strain.

Production of Recombinant Strains

Provided herein are methods of transforming a strain to reduce activity,e.g., using vectors to deliver recombinant genes or to knock-out orotherwise attenuate endogenous genes as desired. These vectors can takethe form of a vector backbone containing a replication origin and aselection marker (typically antibiotic resistance, although many othermethods are possible), or a linear fragment that enables incorporationinto the target cell's chromosome. The vectors should correspond to theorganism and insertion method chosen.

Once the elements of a vector are selected, construction of the vectorcan be performed in many different ways. In an embodiment, a DNAsynthesis service or a method to individually make every vector may beused.

Once the DNA for each vector (including the additional elements requiredfor insertion and operation) is acquired, it must be assembled. Thereare many possible assembly methods including (but not limited to)restriction enzyme cloning, blunt-end ligation, and overlap assembly[see, e.g., Gibson, D. G., et al., Enzymatic assembly of DNA moleculesup to several hundred kilobases. Nature methods, 6(5), 343-345 (2009),and GeneArt Kit(http://tools.invitrogen.com/content/sfs/manuals/geneart_seamless_cloning_and_assembly_man.pdf)].Overlap assembly provides a method to ensure all of the elements getassembled in the correct position and do not introduce any undesiredsequences.

The vectors generated above can be inserted into target cells usingstandard molecular biology techniques, e.g., molecular cloning. In anembodiment, the target cells are already engineered or selected suchthat they already contain the genes required to make the desiredproduct, although this may also be done during or after further vectorinsertion.

Depending on the organism and library element type (plasmid or genomicinsertion), several known methods of inserting the vector comprising DNAto incorporate into the cells may be used. These may include, forexample, transformation of microorganisms able to take up and replicateDNA from the local environment, transformation by electroporation orchemical means, transduction with a virus or phage, mating of two ormore cells, or conjugation from a different cell.

Several methods are known in the art to introduce recombinant DNA inbacterial cells that include but are not limited to transformation,transduction, and electroporation, see Sambrook, et al., MolecularCloning: A Laboratory Manual (1989), Second Edition, Cold Spring HarborPress, Plainview, N.Y. Non-limiting examples of commercial kits andbacterial host cells for transformation include NovaBlue Singles™ (EMDChemicals Inc., NJ, USA), Max Efficiency® DH5α™, One Shot® BL21 (DE3) E.coli cells, One Shot® BL21 (DE3) pLys E. coli cells (Invitrogen Corp.,Carlsbad, Calif., USA), XL1-Blue competent cells (Stratagene, CA, USA).Non limiting examples of commercial kits and bacterial host cells forelectroporation include Zappers™ electrocompetent cells (EMD ChemicalsInc., NJ, USA), XL1-Blue Electroporation-competent cells (Stratagene,CA, USA), ElectroMAX™ A. tumefaciens LBA4404 Cells (Invitrogen Corp.,Carlsbad, Calif., USA).

Several methods are known in the art to introduce recombinant nucleicacid in eukaryotic cells. Exemplary methods include transfection,electroporation, liposome mediated delivery of nucleic acid,microinjection into to the host cell, see Sambrook, et al., MolecularCloning: A Laboratory Manual (1989), Second Edition, Cold Spring HarborPress, Plainview, N.Y. Non-limiting examples of commercial kits andreagents for transfection of recombinant nucleic acid to eukaryotic cellinclude Lipofectamine™ 2000, Optifect™ Reagent, Calcium PhosphateTransfection Kit (Invitrogen Corp., Carlsbad, Calif., USA), GeneJammer®Transfection Reagent, LipoTAXI® Transfection Reagent (Stratagene, CA,USA). Alternatively, recombinant nucleic acid may be introduced intoinsect cells (e.g. sf9, sf21, High Five™) by using baculo viral vectors.

Transformed cells are isolated so that each clone can be testedseparately. In an embodiment, this is done by spreading the culture onone or more plates of culture media containing a selective agent (orlack of one) that will ensure that only transformed cells survive andreproduce. This specific agent may be an antibiotic (if the librarycontains an antibiotic resistance marker), a missing metabolite (forauxotroph complementation), or other means of selection. The cells aregrown into individual colonies, each of which contains a single clone.

Colonies are screened for desired production of a protein, metabolite,or other product, or for reduction in protease activity. In anembodiment, screening identifies recombinant cells having the highest(or high enough) product production titer or efficiency. This includes adecreased proportion of degradation products or an increased totalamount of full-length desired polypeptides collected from a cellculture.

This assay can be performed by growing individual clones, one per well,in multi-well culture plates. Once the cells have reached an appropriatebiomass density, they are induced with methanol. After a period of time,typically 24-72 hours of induction, the cultures are harvested byspinning in a centrifuge to pellet the cells and removing thesupernatant. The supernatant from each culture can then be tested forprotease activity and/or protein degradation.

Silk Sequences

In some embodiments, the modified strains with reduced protease activitydescribed herein recombinantly express a silk-like polypeptide sequence.In some embodiments, the silk-like polypeptide sequences are 1) blockcopolymer polypeptide compositions generated by mixing and matchingrepeat domains derived from silk polypeptide sequences and/or 2)recombinant expression of block copolymer polypeptides havingsufficiently large size (approximately 40 kDa) to form useful fibers bysecretion from an industrially scalable microorganism. Large(approximately 40 kDa to approximately 100 kDa) block copolymerpolypeptides engineered from silk repeat domain fragments, includingsequences from almost all published amino acid sequences of spider silkpolypeptides, can be expressed in the modified microorganisms describedherein. In some embodiments, silk polypeptide sequences are matched anddesigned to produce highly expressed and secreted polypeptides capableof fiber formation. In some embodiments, knock-out of protease genes orreduction of protease activity in the host modified strain reducesdegradation of the silk like polypeptides.

Provided herein, in several embodiments, are compositions for expressionand secretion of block copolymers engineered from a combinatorial mix ofsilk polypeptide domains across the silk polypeptide sequence space,wherein the block copolymers have minimal degradation. In someembodiments provided herein are methods of secreting block copolymers inscalable organisms (e.g., yeast, fungi, and gram positive bacteria) withminimal degradation. In some embodiments, the block copolymerpolypeptide comprises 0 or more N-terminal domains (NTD), 1 or morerepeat domains (REP), and 0 or more C-terminal domains (CTD). In someaspects of the embodiment, the block copolymer polypeptide is >100 aminoacids of a single polypeptide chain. In some embodiments, the blockcopolymer polypeptide comprises a domain that is at least 80%, 81%, 82%,83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%,97%, 98%, or 99% identical to a sequence of a block copolymerpolypeptide as disclosed in International Publication No.WO/2015/042164, “Methods and Compositions for Synthesizing Improved SilkFibers,” incorporated by reference in its entirety.

Several types of native spider silks have been identified. Themechanical properties of each natively spun silk type are believed to beclosely connected to the molecular composition of that silk. See, e.g.,Garb, J. E., et al., Untangling spider silk evolution with spidrointerminal domains, BMC Evol. Biol., 10:243 (2010); Bittencourt, D., etal., Protein families, natural history and biotechnological aspects ofspider silk, Genet. Mol. Res., 11:3 (2012); Rising, A., et al., Spidersilk proteins: recent advances in recombinant production,structure-function relationships and biomedical applications, Cell. Mol.Life Sci., 68:2, pg. 169-184 (2011); and Humenik, M., et al., Spidersilk: understanding the structure-function relationship of a naturalfiber, Prog. Mol. Biol. Transl. Sci., 103, pg. 131-85 (2011). Forexample:

Aciniform (AcSp) silks tend to have high toughness, a result ofmoderately high strength coupled with moderately high extensibility.AcSp silks are characterized by large block (“ensemble repeat”) sizesthat often incorporate motifs of poly serine and GPX. Tubuliform (TuSpor Cylindrical) silks tend to have large diameters, with modest strengthand high extensibility. TuSp silks are characterized by their polyserine and poly threonine content, and short tracts of poly alanine.Major Ampullate (MaSp) silks tend to have high strength and modestextensibility. MaSp silks can be one of two subtypes: MaSp1 and MaSp2.MaSp1 silks are generally less extensible than MaSp2 silks, and arecharacterized by poly alanine, GX, and GGX motifs. MaSp2 silks arecharacterized by poly alanine, GGX, and GPX motifs. Minor Ampullate(MiSp) silks tend to have modest strength and modest extensibility. MiSpsilks are characterized by GGX, GA, and poly A motifs, and often containspacer elements of approximately 100 amino acids. Flagelliform (Flag)silks tend to have very high extensibility and modest strength. Flagsilks are usually characterized by GPG, GGX, and short spacer motifs.

The properties of each silk type can vary from species to species, andspiders leading distinct lifestyles (e.g. sedentary web spinners vs.vagabond hunters) or that are evolutionarily older may produce silksthat differ in properties from the above descriptions (for descriptionsof spider diversity and classification, see Hormiga, G., and Griswold,C. E., Systematics, phylogeny, and evolution of orb-weaving spiders,Annu. Rev. Entomol. 59, pg. 487-512 (2014); and Blackedge, T. A. et al.,Reconstructing web evolution and spider diversification in the molecularera, Proc. Natl. Acad. Sci. U.S.A., 106:13, pg. 5229-5234 (2009)).However, synthetic block copolymer polypeptides having sequencesimilarity and/or amino acid composition similarity to the repeatdomains of native silk proteins can be used to manufacture on commercialscales consistent silk-like fibers that recapitulate the properties ofcorresponding natural silk fibers.

In some embodiments, a list of putative silk sequences can be compiledby searching GenBank for relevant terms, e.g. “spidroin” “fibroin”“MaSp”, and those sequences can be pooled with additional sequencesobtained through independent sequencing efforts. Sequences are thentranslated into amino acids, filtered for duplicate entries, andmanually split into domains (NTD, REP, CTD). In some embodiments,candidate amino acid sequences are reverse translated into a DNAsequence optimized for expression in Pichia (Komagataella) pastoris. TheDNA sequences are each cloned into an expression vector and transformedinto Pichia (Komagataella) pastoris. In some embodiments, various silkdomains demonstrating successful expression and secretion aresubsequently assembled in combinatorial fashion to build silk moleculescapable of fiber formation.

Silk polypeptides are characteristically composed of a repeat domain(REP) flanked by non-repetitive regions (e.g., C-terminal and N-terminaldomains). In an embodiment, both the C-terminal and N-terminal domainsare between 75-350 amino acids in length. The repeat domain exhibits ahierarchical architecture. The repeat domain comprises a series ofblocks (also called repeat units). The blocks are repeated, sometimesperfectly and sometimes imperfectly (making up a quasi-repeat domain),throughout the silk repeat domain. The length and composition of blocksvaries among different silk types and across different species. Table 1lists examples of block sequences from selected species and silk types,with further examples presented in Rising, A. et al., Spider silkproteins: recent advances in recombinant production, structure-functionrelationships and biomedical applications, Cell Mol. Life Sci., 68:2, pg169-184 (2011); and Gatesy, J. et al., Extreme diversity, conservation,and convergence of spider silk fibroin sequences, Science, 291:5513, pg.2603-2605 (2001). In some cases, blocks may be arranged in a regularpattern, forming larger macro-repeats that appear multiple times(usually 2-8) in the repeat domain of the silk sequence. Repeated blocksinside a repeat domain or macro-repeat, and repeated macro-repeatswithin the repeat domain, may be separated by spacing elements. In someembodiments, block sequences comprise a glycine rich region followed bya polyA region. In some embodiments, short (^(˜)1-10) amino acid motifsappear multiple times inside of blocks. For the purpose of thisinvention, blocks from different natural silk polypeptides can beselected without reference to circular permutation (i.e., identifiedblocks that are otherwise similar between silk polypeptides may notalign due to circular permutation). Thus, for example, a “block” ofSGAGG (SEQ ID NO: 494) is, for the purposes of the present invention,the same as GSGAG (SEQ ID NO: 495) and the same as GGSGA (SEQ ID NO:496); they are all just circular permutations of each other. Theparticular permutation selected for a given silk sequence can bedictated by convenience (usually starting with a G) more than anythingelse. Silk sequences obtained from the NCBI database can be partitionedinto blocks and non-repetitive regions.

TABLE 1 Samples of Block Sequences Species Silk Type RepresentativeBlock Amino Acid Sequence Aliatypus gulosus Fibroin 1GAASSSSTIITTKSASASAAADASAAATASAASRSSANAAASAFAQSFSSILLESGYFCSIFGSSISSSYAAAIASAASRAAAESNGYTTHAYACAKAVASAVERVTSGADAYAYAQAISDALSHALLYTGRLNTANANSLASAFAYAFANAAAQASASSASAGAASASGAASASGAGSAS (SEQ ID NO: 497) Plectreurystristis Fibroin 1 GAGAGAGAGAGAGAGAGSGASTSVSTSSSSGSGAGAGAGSGAGSGAGAGSGAGAGAGAGGAGAGFGSGLGLGYGVGLSSAQAQAQAQAAAQAQAQAQAQAYAAAQAQAQAQAQAQAAAAAAAAAAA (SEQ ID NO: 498) Plectreurys tristisFibroin 4 GAAQKQPSGESSVATASAAATSVTSGGAPVGKPGVPAPIFYPQGPLQQGPAPGPSNVQPGTSQQGPIGGVGGSNAFSSSFASALSLNRGFTEVISSASATAVASAFQKGLAPYGTAFALSAASAAADAYNSIGSGANAFAYAQAFARVLYPLVQQYGLSSSAKASAFASAIASSFSSGTSGQGPSIGQQQPPVTISAASASAGASAAAVGGGQVGQGPYGGQQQSTAASASAAAA TATS (SEQ ID NO: 499)Araneus TuSp GNVGYQLGLKVANSLGLGNAQALASSLSQAVSAVGVGASSNAYANAV gemmoidesSNAVGQVLAGQGILNAANAGSLASSFASALSSSAASVASQSASQSQAASQSQAAASAFRQAASQSASQSDSRAGSQSSTKTTSTSTSGSQADSRSASSSASQASASAFAQQSSASLSSSSSFSSAFSSATSISAV (SEQ ID NO: 500) Argiopeaurantia TuSp GSLASSFASALSASAASVASSAAAQAASQSQAAASAFSRAASQSASQSAARSGAQSISTTTTTSTAGSQAASQSASSAASQASASSFARASSASLAASSSFSSAFSSANSLSALGNVGYQLGFNVANNLGIGNAAGLGNALSQAVSSVGVGASSSTYANAVSNAVGQFLAGQGILNAANA (SEQ ID NO: 501) Deinopisspinosa TuSp GASASAYASAISNAVGPYLYGLGLFNQANAASFASSFASAVSSAVASASASAASSAYAQSAAAQAQAASSAFSQAAAQSAAAASAGASAGAGASAGAGAVAGAGAVAGAGAVAGASAAAASQAAASSSASAVASAFAQSASYALASSSAFANAFASATSAGYLGSLAYQLGLTTAYNLGLSNAQAFAS TLSQAVTGVGL (SEQ ID NO:502) Nephila clavipes TuSpGATAASYGNALSTAAAQFFATAGLLNAGNASALASSFARAFSASAESQSFAQSQAFQQASAFQQAASRSASQSAAEAGSTSSSTTTTTSAARSQAASQSASSSYSSAFAQAASSSLATSSALSRAFSSVSSASAASSLAYSIGLSAARSLGIADAAGLAGVLARAAGALGQ (SEQ ID NO: 503) Argiope trifasciata FlagGGAPGGGPGGAGPGGAGFGPGGGAGFGPGGGAGFGPGGAAGGPGGPGGPGGPGGAGGYGPGGAGGYGPGGVGPGGAGGYGPGGAGGYGPGGSGPGGAGPGGAGGEGPVTVDVDVTVGPEGVGGGPGGAGPGGAGFGPGGGAGFGPGGAPGAPGGPGGPGGPGGPGGPGGVGPGGAGGYGPGGAGGVGPAGTGGFGPGGAGGFGPGGAGGFGPGGAGGFGPAGAGGYGPGGVGPGGAGGFGPGGVGPGGSGPGGAGGEGPVTVDVDVSV (SEQ ID NO: 504) Nephila clavipes FlagGVSYGPGGAGGPYGPGGPYGPGGEGPGGAGGPYGPGGVGPGGSGPGGYGPGGAGPGGYGPGGSGPGGYGPGGSGPGGYGPGGSGPGGYGPGGSGPGGYGPGGYGPGGSGPGGSGPGGSGPGGYGPGGTGPGGSGPGGYGPGGSGPGGSGPGGYGPGGSGPGGFGPGGSGPGGYGPGGSGPGGAGPGGVGPGGFGPGGAGPGGAAPGGAGPGGAGPGGAGPGGAGPGGAGPGGAGPGGAGGAGGAGGSGGAGGSGGTTIIEDLDITIDGADGPITISEELPISGAGGSGPGGAGPGGVGPGGSGPGGVGPGGSGPGGVGPGGSGPGGVGPGGAGGPYGPGGSGPGGAGGAGGPGGAYGPGGSYGPGGSGGPGGAGGPYGPGGEGPGGAGGPYGPGGAGGPYGPGGAGGPYGPGGEGGPYGP (SEQ ID NO: 505)Latrodectus AcSp GINVDSDIGSVTSLILSGSTLQMTIPAGGDDLSGGYPGGFPAGAQPShesperus GGAPVDFGGPSAGGDVAAKLARSLASTLASSGVFRAAFNSRVSTPVAVQLTDALVQKIASNLGLDYATASKLRKASQAVSKVRMGSDTNAYALAISSALAEVLSSSGKVADANINQIAPQLASGIVLGVSTTAPQFGVDLSSINVNLDISNVARNMQASIQGGPAPITAEGPDFGAGYPGGAPTDLSGLDMGAPSDGSRGGDATAKLLQALVPALLKSDVFRAIYKRGTRKQVVQYVTNSALQQAASSLGLDASTISQLQTKATQALSSVSADSDSTAYAKAFGLAIAQVLGTSGQVNDANVNQIGAKLATGILRGSSAVAPRLGIDLS (SEQ ID NO: 506) Argiopetrifasciata AcSp GAGYTGPSGPSTGPSGYPGPLGGGAPFGQSGFGGSAGPQGGFGATGGASAGLISRVANALANTSTLRTVLRTGVSQQIASSVVQRAAQSLASTLGVDGNNLARFAVQAVSRLPAGSDTSAYAQAFSSALFNAGVLNASNIDTLGSRVLSALLNGVSSAAQGLGINVDSGSVQSDISSSSSFLSTSSSS ASYSQASASSTS (SEQ ID NO:507) Uloborus diversus AcSpGASAADIATAIAASVATSLQSNGVLTASNVSQLSNQLASYVSSGLSSTASSLGIQLGASLGAGFGASAGLSASTDISSSVEATSASTLSSSASSTSVVSSINAQLVPALAQTAVLNAAFSNINTQNAIRIAELLTQQVGRQYGLSGSDVATASSQIRSALYSVQQGSASSAYVSAIVGPLITALSSRGVVNASNSSQIASSLATAILQFTANVAPQFGISIPTSAVQSDLSTISQSLTAISSQTSSSVDSSTSAFGGISGPSGPSPYGPQPSGPTFGPGPSLSGLTGFTATFASSFKSTLASSTQFQLIAQSNLDVQTRSSLISKVLINALSSLGISASVASSIAASSSQSLLSVSA (SEQ ID NO: 508) Euprosthenops MaSp1GGQGGQGQGRYGQGAGSSAAAAAAAAAAAAAA (SEQ ID NO: australis 509) TetragnathaMaSp1 GGLGGGQGAGQGGQQGAGQGGYGSGLGGAGQGASAAAAAAAA (SEQ kauaiensis ID NO:510) Argiope aurantia MaSp2 GGYGPGAGQQGPGSQGPGSGGQQGPGGLGPYGPSAAAAAAAA(SEQ ID NO: 511) Deinopis spinosa MaSp2GPGGYGGPGQQGPGQGQYGPGTGQQGQGPSGQQGPAGAAAAAAAAA (SEQ ID NO: 512) Nephilaclavata MaSp2 GPGGYGLGQQGPGQQGPGQQGPAGYGPSGLSGPGGAAAAAAA (SEQ ID NO:513)

Fiber-forming block copolymer polypeptides from the blocks and/ormacro-repeat domains, according to certain embodiments of the invention,is described in International Publication No. WO/2015/042164,incorporated by reference. Natural silk sequences obtained from aprotein database such as GenBank or through de novo sequencing arebroken up by domain (N-terminal domain, repeat domain, and C-terminaldomain). The N-terminal domain and C-terminal domain sequences selectedfor the purpose of synthesis and assembly into fibers include naturalamino acid sequence information and other modifications describedherein. The repeat domain is decomposed into repeat sequences containingrepresentative blocks, usually 1-8 depending upon the type of silk, thatcapture critical amino acid information while reducing the size of theDNA encoding the amino acids into a readily synthesizable fragment. Insome embodiments, a properly formed block copolymer polypeptidecomprises at least one repeat domain comprising at least 1 repeatsequence, and is optionally flanked by an N-terminal domain and/or aC-terminal domain.

In some embodiments, a repeat domain comprises at least one repeatsequence. In some embodiments, the repeat sequence is 150-300 amino acidresidues. In some embodiments, the repeat sequence comprises a pluralityof blocks. In some embodiments, the repeat sequence comprises aplurality of macro-repeats. In some embodiments, a block or amacro-repeat is split across multiple repeat sequences.

In some embodiments, the repeat sequence starts with a Glycine, andcannot end with phenylalanine (F), tyrosine (Y), tryptophan (W),cysteine (C), histidine (H), asparagine (N), methionine (M), or asparticacid (D) to satisfy DNA assembly requirements. In some embodiments, someof the repeat sequences can be altered as compared to native sequences.

In some embodiments, the repeat sequences can be altered such as byaddition of a serine to the C terminus of the polypeptide (to avoidterminating in F, Y, W, C, H, N, M, or D). In some embodiments, therepeat sequence can be modified by filling in an incomplete block withhomologous sequence from another block. In some embodiments, the repeatsequence can be modified by rearranging the order of blocks ormacrorepeats.

In some embodiments, non-repetitive N- and C-terminal domains can beselected for synthesis. In some embodiments, N-terminal domains can beby removal of the leading signal sequence, e.g., as identified bySignalP (Peterson, T. N., et. Al., SignalP 4.0: discriminating signalpeptides from transmembrane regions, Nat. Methods, 8:10, pg. 785-786(2011).

In some embodiments, the N-terminal domain, repeat sequence, orC-terminal domain sequences can be derived from Agelenopsis aperta,Aliatypus gulosus, Aphonopelma seemanni, Aptostichus sp. AS217,Aptostichus sp. AS220, Araneus diadematus, Araneus gemmoides, Araneusventricosus, Argiope amoena, Argiope argentata, Argiope bruennichi,Argiope trifasciata, Atypoides riversi, Avicularia juruensis,Bothriocyrtum californicum, Deinopis Spinosa, Diguetia canities,Dolomedes tenebrosus, Euagrus chisoseus, Euprosthenops australis,Gasteracantha mammosa, Hypochilus thorelli, Kukulcania hibernalis,Latrodectus hesperus, Megahexurafulva, Metepeira grandiosa, Nephilaantipodiana, Nephila clavata, Nephila clavipes, Nephilamadagascariensis, Nephila pilipes, Nephilengys cruentata, Parawixiabistriata, Peucetia viridans, Plectreurys tristis, Poecilotheriaregalis, Tetragnatha kauaiensis, or Uloborus diversus.

In some embodiments, the silk polypeptide nucleotide coding sequence canbe operatively linked to an alpha mating factor nucleotide codingsequence. In some embodiments, the silk polypeptide nucleotide codingsequence can be operatively linked to another endogenous or heterologoussecretion signal coding sequence. In some embodiments, the silkpolypeptide nucleotide coding sequence can be operatively linked to a 3×FLAG nucleotide coding sequence. In some embodiments, the silkpolypeptide nucleotide coding sequence is operatively linked to otheraffinity tags such as 6-8 His residues (SEQ ID NO: 520).

Silk-Like Polypeptides

In some embodiments, the P. pastoris strains disclosed herein have beenmodified to express a silk-like polypeptide. Methods of manufacturingpreferred embodiments of silk-like polypeptides are provided in WO2015/042164, especially at Paragraphs 114-134, incorporated herein byreference. Disclosed therein are synthetic proteinaceous copolymersbased on recombinant spider silk protein fragment sequences derived fromMaSp2, such as from the species Argiope bruennichi. Silk-likepolypeptides are described that include two to twenty repeat units, inwhich a molecular weight of each repeat unit is greater than about 20kDa. Within each repeat unit of the copolymer are more than about 60amino acid residues that are organized into a number of “quasi-repeatunits.” In some embodiments, the repeat unit of a polypeptide describedin this disclosure has at least 95% sequence identity to a MaSp2dragline silk protein sequence.

In some embodiments, each “repeat unit” of a silk-like polypeptidecomprises from two to twenty “quasi-repeat” units (i.e., n3 is from 2 to20). Quasi-repeats do not have to be exact repeats. Each repeat can bemade up of concatenated quasi-repeats. Equation 1 shows the compositionof a repeat unit according the present disclosure and that incorporatedby reference from WO 2015/042164. Each silk-like polypeptide can haveone or more repeat units as defined by Equation 1.{GGY-[GPG-X₁]n₁-GPS-(A)n₂}n₃ (SEQ ID NO: 514).  (Equation 1)

The variable compositional element X₁ (termed a “motif”) is according toany one of the following amino acid sequences shown in Equation 2 and X₁varies randomly within each quasi-repeat unit.X₁=SGGQQ (SEQ ID NO: 515) or GAGQQ (SEQ ID NO: 516) or GQGPY (SEQ ID NO:517) or AGQQ (SEQ ID NO: 518) or SQ  (Equation 2)

Referring again to Equation 1, the compositional element of aquasi-repeat unit represented by “GGY-[GPG-X₁]_(n1)-GPS” (SEQ ID NO:521) in Equation 1 is referred to a “first region.” A quasi-repeat unitis formed, in part by repeating from 4 to 8 times the first regionwithin the quasi-repeat unit. That is, the value of n1 indicates thenumber of first region units that are repeated within a singlequasi-repeat unit, the value of n1 being any one of 4, 5, 6, 7 or 8. Thecompositional element represented by “(A)_(n2)” (SEQ ID NO: 522) (i.e.,a polyA sequence) is referred to as a “second region” and is formed byrepeating within each quasi-repeat unit the amino acid sequence “A” n₂times (SEQ ID NO: 522). That is, the value of n₂ indicates the number ofsecond region units that are repeated within a single quasi-repeat unit,the value of n₂ being any one of 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, or 20. In some embodiments, the repeat unit of a polypeptideof this disclosure has at least 95% sequence identity to a sequencecontaining quasi-repeats described by Equations 1 and 2. In someembodiments, the repeat unit of a polypeptide of this disclosure has atleast 80%, or at least 90%, or at least 95%, or at least 99% sequenceidentity to a sequence containing quasi-repeats described by Equations 1and 2.

In additional embodiments, 3 “long” quasi repeats are followed by 3“short” quasi-repeat units. Short quasi-repeat units are those in whichn₁=4 or 5. Long quasi-repeat units are defined as those in which n₁=6, 7or 8. In some embodiments, all of the short quasi-repeats have the sameX₁ motifs in the same positions within each quasi-repeat unit of arepeat unit. In some embodiments, no more than 3 quasi-repeat units outof 6 share the same X₁ motifs.

In additional embodiments, a repeat unit is composed of quasi-repeatunits that do not use the same X₁ more than two occurrences in a rowwithin a repeat unit. In additional embodiments, a repeat unit iscomposed of quasi-repeat units where at least 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 of the quasi-repeats do notuse the same X₁ more than 2 times in a single quasi-repeat unit of therepeat unit.

Thus, in some embodiments, provided herein are strains of yeast thatrecombinantly express silk-like polypeptides with a reduced degradationto increase the amount of full-length polypeptides present in theisolated product from a cell culture. In some embodiments, the strainexpressing a silk-like polypeptide is a P. pastoris strain comprises aPAS_chr4_0584 knock-out and a PAS_chr3_1157 knock-out.

Equivalents and Scope

Those skilled in the art will recognize, or be able to ascertain usingno more than routine experimentation, many equivalents to the specificembodiments in accordance with the invention described herein. The scopeof the present invention is not intended to be limited to the aboveDescription, but rather is as set forth in the appended claims.

In the claims, articles such as “a,” “an,” and “the” may mean one ormore than one unless indicated to the contrary or otherwise evident fromthe context. Claims or descriptions that include “or” between one ormore members of a group are considered satisfied if one, more than one,or all of the group members are present in, employed in, or otherwiserelevant to a given product or process unless indicated to the contraryor otherwise evident from the context. The invention includesembodiments in which exactly one member of the group is present in,employed in, or otherwise relevant to a given product or process.

The invention includes embodiments in which more than one, or all of thegroup members are present in, employed in, or otherwise relevant to agiven product or process.

It is also noted that the term “comprising” is intended to be open andpermits but does not require the inclusion of additional elements orsteps. When the term “comprising” is used herein, the term “consistingof” is thus also encompassed and disclosed.

Where ranges are given, endpoints are included. Furthermore, it is to beunderstood that unless otherwise indicated or otherwise evident from thecontext and understanding of one of ordinary skill in the art, valuesthat are expressed as ranges can assume any specific value or subrangewithin the stated ranges in different embodiments of the invention, tothe tenth of the unit of the lower limit of the range, unless thecontext clearly dictates otherwise.

All cited sources, for example, references, publications, databases,database entries, and art cited herein, are incorporated into thisapplication by reference, even if not expressly stated in the citation.In case of conflicting statements of a cited source and the instantapplication, the statement in the instant application shall control.

Section and table headings are not intended to be limiting.

EXAMPLES

Below are examples of specific embodiments for carrying out the presentinvention. The examples are offered for illustrative purposes only, andare not intended to limit the scope of the present invention in any way.Efforts have been made to ensure accuracy with respect to numbers used(e.g., amounts, temperatures, etc.), but some experimental error anddeviation should, of course, be allowed for.

The practice of the present invention will employ, unless otherwiseindicated, conventional methods of protein chemistry, biochemistry,recombinant DNA techniques and pharmacology, within the skill of theart. Such techniques are explained fully in the literature. See, e.g.,T. E. Creighton, Proteins: Structures and Molecular Properties (W.H.Freeman and Company, 1993); A. L. Lehninger, Biochemistry (WorthPublishers, Inc., current addition); Sambrook, et al., MolecularCloning: A Laboratory Manual (2nd Edition, 1989); Methods In Enzymology(S. Colowick and N. Kaplan eds., Academic Press, Inc.); Remington'sPharmaceutical Sciences, 18th Edition (Easton, Pa.: Mack PublishingCompany, 1990); Carey and Sundberg Advanced Organic Chemistry 3rd Ed.(Plenum Press) Vols A and B (1992).

Example 1: Production of Recombinant Yeast Expressing 18B

First, we transformed a strain of P. pastoris to abrogate KU70 functionto facilitate further editing and engineering. A HIS+ derivative ofPichia pastoris (Komagataella phaffii) strain GS115 (NRRL Y15851) waselectroporated with a DNA cassette consisting of homology arms flankinga zeocin resistance marker and targeting the KU70 locus. A map of thecassette is shown in FIG. 1, and sequences are provided in Table 10.Transformants were plated on YPD agar plates supplemented with zeocin.This resulted in abrogation of KU70 function.

Then, we modified this strain to express a recombinant gene encoding asilk-like polypeptide. A HIS+ derivative of Pichia pastoris(Komagataella phaffli) strain GS115 (NRRL Y15851) was transformed with arecombinant vector (SEQ ID NO: 462) to cause expression and secretion ofa silk-like polypeptide (“18B”) (SEQ ID NO: 463). Transformation wasaccomplished by electroporation as described in PMID 15679083,incorporated by reference herein.

Each vector includes an 18B expression cassette with the polynucleotidesequence encoding the silk-like protein in the recombinant vectorsflanked by a promoter (pGCW14) and a terminator (tAOX1 pA signal). Therecombinant vectors further comprised dominant resistance markers forselection of bacterial and yeast transformants, and a bacterial originof replication. The first recombinant vector included targeting regionsthat directed integration of the 18B polynucleotide sequencesimmediately 3′ of the AOX2 loci in the Pichia pastoris genome. Theresistance marker in the first vector conferred resistance to G418 (akageneticin). The second recombinant vector included targeting regionsthat directed integration of the 18B polynucleotide sequencesimmediately 3′ of the TEF1 loci in the Pichia pastoris genome. Theresistance marker in the second vector conferred resistance toHygromycin B.

Example 2: Generating a Library of Single Protease KO Mutants

After successful transformation and secretion of 18B in a recombinantPichia pastoris strain, 65 open reading frames (ORFs) encoding proteaseswere individually targeted for deletion (Table 2). Cells weretransformed with vector comprising a DNA cassette with ˜1150 bp homologyarms flanking a nourseothricin resistance marker. A plasmid mapcomprising the nourseothricin resistance marker is shown in FIG. 2, andsequences provided in Table 11.

Homology arms used for each target were amplified by the primersprovided in Table 7, and inserted into the nourseothricin resistanceplasmid. Homology arms were inserted into the nourseothricin plasmid togenerate cassettes comprising a nourseothricin resistance marker flandedby 3′ and 5′ homology arms to the target protease as shown in FIG. 3Aand FIG. 3B. In FIG. 3A, the resistance cassette (Nour ResistanceCassette) is shown flanked by homology arms (HA1 and HA2). In FIG. 3B,details of the nourseothricin marker are shown, including the promoterfrom ILV5 gene from Saccharomyces cerevisiae (pILV5), the Nourseothricinacetyltransferase gene from Streptomyces noursei (nat), and the polyAsignal from CYC1 gene from Saccharomyces cerevisiae.

The homology arms in each vector targeted one of the 65 desired proteaseloci as provided in Table 2. Transformants were plated on YPD agarplates supplemented with nourseothricin, and incubated for 48 hours at30° C.

TABLE 2 Proteases targeted for deletion in P. Pastoris strain ProteaseProtease ORF polypeptide sequence Sequence Protease Gene Symbol (SEQ IDNO:) (SEQ ID NO:) PAS_chr4_0584 (YPS1-1) 1 67 PAS_chr3_1157 (YPS1-2) 268 PAS_chr3_0299 (YPS1-3) 3 PAS_chr3_0303 4 PAS_chr3_0866 5PAS_chr3_0394 6 PAS_chr1-1_0379 (MCK7) 7 PAS chr1-1 0174 8 PAS chr1-10226 9 PAS_chr3_1087 10 PAS_chr3_0076 11 PAS_chr3_0691 12 PAS_chr3_081513 PAS_chr1-4_0164 14 PAS_chr3_0979 15 PAS_chr3_0803 16 PAS_chr2-1_036617 PAS_chr3_0842 18 PAS_chr1-3_0195 19 PAS_chr1-4_0052 20PAS_chr2-2_0057 21 PAS_chr1-3_0150 22 PAS_chr1-3_0221 23 PAS_FragD_002224 PAS_chr2-1_0159 25 PAS_chr2-1_0326 26 PAS_chr1-4_0611 27PAS_chr1-1_0274 28 PAS_chr4_0834 29 PAS_chr3_0896 30 PAS_chr3_0561 31PAS_chr3_0633 32 PAS_chr4_0013 33 PAS_chr2-1_0172 34 PAS_chr1-4_0251 35PAS_chr4_0874 36 PAS_chr3_0513 37 PAS_chr1-1_0127 38 PAS_chr4_0686 39PAS_chr2-2_0056 40 PAS_chr2-2_0159 41 PAS_chr3_0388 42 PAS_chr3_0419 43PAS_chr1-3_0258 44 PAS_chr4_0913 45 PAS_chr1-1_0066 46 PAS_chr2-2_031047 PAS_chr1-3_0261 48 PAS_chr2-1_0546 49 PAS_chr2-2_0398 50PAS_chr4_0835 51 PAS_chr1-1_0491 52 PAS_chr2-1_0447 53 PAS_chr1-3_005354 PAS_chr3_0200 55 PAS_chr1-3_0105 56 PAS_chr3_0635 57 PAS_chr4_0503 58PAS_chr2-1_0569 59 PAS_chr3_1223 60 PAS_chr2-1_0597 61 PAS_chr1-1_032762 PAS_chr2-2_0380 63 PAS_chr3_0928 64 PAS_chr1-3_0184 65

Example 3: Testing Single Protease Knockout Clones for Reduced ProteinDegradation

Resulting clones were inoculated into 400 μL of BufferedGlycerol-complex Medium (BMGY) in 96-well blocks, and incubated for 48hours at 30° C. with agitation at 1,000 rpm. Following the 48-hourincubation, 4 μL of each culture was used to inoculate 400 μL of BMGY in96-well blocks, which were then incubated for 48 hours at 30° C.Guanidine thiocyanate was added to a final concentration of 2.5M to thecell cultures to extract the recombinant protein. After a 5 minuteincubation, solutions were centrifuged and the supernatant was sampledand analyzed by western blot.

Western blot data for a representative clone of each protease knock-outis shown in FIG. 4. Single protease deletions showed no discernableimpact on the distribution of 18B silk fragments detected via westernblot.

Example 4: Generating a Library of Protease Double Knock-Outs

In addition to the individual KOs, different pair-wise combinations ofproteases were knocked out. These proteases were selected, in part,because they were paralogs that may have compensatory function withrespect to each other.

To generate double knockouts, nourseothricin resistance was eliminatedfrom the single protease knock-out strains produced in Example 2, and asecond protease deleted by transformation with a second nourseothricinresistance cassette as provided in Example 2. Transformants were platedon YPD agar plates supplemented with nourseothricin, and incubated for48 hours at 30° C. Double protease knock-outs tested are provided inTable 3.

TABLE 3 Protease double KO strains of P. Pastoris expressing silk-likepolypeptide Double ORF SEQ ORF SEQ KO Strain Protease KO 1 ID NO:Protease KO 2 ID NO: 1 PAS_chr1-1_0379 7 PAS_chr3_0299 3 2 PAS_chr3_03946 PAS_chr3_0303 4 3 PAS_chr4_0584 1 PAS_chr3_1157 2 4 PAS_chr3_0076 11PAS_chr1-4_0164 14 5 PAS_chr4_0584 1 PAS_chr3_0299 3 6 PAS_chr1-3_019519 PAS_chr1-4_0289 66 7 PAS_chr3_0896 30 PAS_chr2-2_0310 47 8PAS_chr3_0394 6 PAS_chr3_1157 2

Example 5: Testing Double Protease Knockout Clones for Reduced ProteinDegradation

Resulting clones were inoculated into 400 μL of BufferedGlycerol-complex Medium (BMGY) in 96-well blocks, and incubated for 48hours at 30° C. with agitation at 1,000 rpm. Following the 48-hourincubation, 4 μL of each culture was used to inoculate 400 μL of BMGY in96-well blocks, which were then incubated for 48 hours at 30° C.Guanidine thiocyanate was added to a final concentration of 2.5M to thecell cultures to extract the recombinant protein. After a 5 minincubation, solutions were centrifuged and the supernatant was sampledand analyzed by western blot.

FIG. 5 shows representative results from different protease doubleknockout strains. As shown, despite the presence of protein degradationin all single knockout strains tested, the combination ofPAS_chr4_0584+PAS_chr3_1157 protease knockout (Strain 3 from Table 3)resulted in the near-complete elimination of 18B degradation products.None of the other combinations of proteases resulted in the eliminationof degradation products.

Example 6: Additional Protease Knock-Out Strains

As shown in Examples 4 and 5, a modified Pichia pastoris cell capable ofproducing a desired protein (e.g., 18B) was transformed to deleteproteases at PAS_chr4_0584 and PAS_chr3_1157 to mitigate degradation ofthe desired protein. We further knocked out one or more additionalproteases to enhance the production of full-length products and minimizedegradation.

For each additional knockout, an additional protease gene was deletedfrom a single protease KO (1×KO), double protease KO (2×KO), tripleprotease KO (3×KO), or quadruple protease KO (4×KO) by transformationwith a nourseothricin resistance cassette with homology arms targetingthe desired gene as provided in Example 2. The protease genes knockedout in each strain are shown in Table 4:

TABLE 4 2X-5X KO Strains KO Strain Protease Genes Knocked Out 2X KOPAS_chr4_0584 (YPS1-1) PAS_chr3_1157 (YPS1-2) 3X KO PAS_chr4_0584(YPS1-1) PAS_chr3_1157 (YPS1-2) PAS_chr3_0688 (YPS1-5) 4X KOPAS_chr4_0584 (YPS1-1) PAS_chr3_1157 (YPS1-2) PAS_chr3_0688 (YPS1-5)PAS_chr1-1_0379 (MCK7) 5X KO PAS_chr4_0584 (YPS1-1) PAS_chr3_1157(YPS1-2) PAS_chr3_0688 (YPS1-5) PAS_chr1-1_0379 (MCK7) PAS_chr3_0299(YPS1-3)

The resulting cells were isolated on selective media plates (byauxotrophy or antibiotic resistance marker) and individual clones wereisolated for further testing. Individual clones were tested by liquidculture assay under product protein producing conditions as follows:Isolated colonies of each strain were inoculated into 400 μL of BufferedGlycerol-complex Medium (BMGY) in 96-well blocks, and incubated for 48hours at 30° C. with agitation at 1,000 rpm. Following the 48-hourincubation, 4 μL of each culture was used to inoculate either 400 μL ofBMGY or 400 μL of YPD (Yeast Extract Peptone Dextrose Medium) in 96-wellblocks, which were then incubated for 48 hours at 30° C. with agitationat 1,000 rpm.

Protein expressed by the cells was isolated and analyzed for degradationas follows: Guanidine thiocyanate was added to a final concentration of2.5M to the cell cultures to extract the recombinant protein. After a 5min incubation, solutions were centrifuged and the supernatant wassampled and analyzed by western blot.

FIG. 6 shows the results of a Western Blot of purified protein from the2×KO, 3×KO, 4×KO and 5×KO strains inoculated in BMGY or YPD. As shown,the deletion of additional protease genes from the strain having thePAS_chr4_0584+PAS_chr3_1157 protease knockout (Strain 3 from Table 3)resulted in the further elimination of 18B degradation products.

OTHER EMBODIMENTS

It is to be understood that the words which have been used are words ofdescription rather than limitation, and that changes may be made withinthe purview of the appended claims without departing from the true scopeand spirit of the invention in its broader aspects.

While the present invention has been described at some length and withsome particularity with respect to the several described embodiments, itis not intended that it should be limited to any such particulars orembodiments or any particular embodiment, but it is to be construed withreferences to the appended claims so as to provide the broadest possibleinterpretation of such claims in view of the prior art and, therefore,to effectively encompass the intended scope of the invention.

All publications, patent applications, patents, and other referencesmentioned herein are incorporated by reference in their entirety. Incase of conflict, the present specification, including definitions, willcontrol. In addition, section headings, the materials, methods, andexamples are illustrative only and not intended to be limiting.

SEQUENCE LISTING

TABLE 5 Open reading frame nucleotide sequence for proteases targetedfor deletion in P. pastoris Protease Gene Symbol/Locus tag SEQ ID NO:Open reading frame nucleotide sequence (5′ to 3′) PAS_chr4_0584 1atgttgaaggatcagttcttgttatgggttgctttgatagcgagcgtaccggtttccggcgtgatggcagctcctagcgagtccgggcataacacggttgaaaaacgagatgccaaaaacgttgttggcgttcaacagttggacttcagcgttctgaggggtgattccttcgaaagtgcctcttcagagaacgtgcctcggcttgtgaggagagatgacacgctagaagctgagctaatcaaccagcaatcattctacttgtcacgactgaaagttggatcacatcaagcggatattggaatcctagtggacacaggatcctctgatttatgggtaatggactcggtaaacccatactgcagtagccgttcccgcgtgaagagagatatacacgatgagaagatcgccgaatgggatcccatcaatctcaagaaaaatgaaacttctcagaataaaaatttttgggattggctcgttggaactagcactagttctccttccaccgccacggcaactggtagtggtagtggtagtggtagtggtagtggtagtggtagtgctgccacagccgtatcggtaagttctgcacaggcaacattggattgctctacgtatggaacgtttgatcacgctgattcctcgacgttccatgacaataatacagactttttcatctcatacgctgataccacttttgcttcaggaatctggggttatgacgacgtcattatcgacggcatagaggtgaaagaactttccttcgccgttgcagacatgaccaattcctctattggtgtgttaggtattggactgaaaggcctagaatccacatatgctagtgcatcttcggtcagtgaaatgtatcagtatgacaatttgccagccaagatggtcaccgatgggttgatcaacaaaaatgcatactccttgtacttgaactccaaggacgcctcaagtggttccatcctctttggaggtgtggatcatgaaaaatattcgggacaattgttgacagttccagtcatcaacacactcgcttccagtggttacagagaggcaattcgtttacaaattactttaaatggaatagatgtgaaaaagggttctgaccagggaactcttttacaagggagatttgctgcattattggactctggagctacgctaacgtatgctccttcttctgttttaaattcaattggccggaacctgggcggctcctatgattcgtcaagacaagcttataccattcgttgtgtttctgcatcagataccacttctctggtattcaattttgggggtgctacagtggaagtttccctgtacgatctacagattgcaacatattacaccgggggaagtgccacgcaatgtcttattggaatattcagctctggaagtgatgagtttgtgctcggtgataccttcttgaggtcagcctacgtggtttacgatcttgatgggcttgaagtgtcgcttgcccaagccaacttcaacgaaaccgattctgatgttgaggctattacctccagtgtaccttccgctactcgtgcatccggatacagttctacatggtctggttctgccagcggtacagtttacacttcggttcagatggaatccggtgctgcttccagctccaactcttctggatcgaatatgggttcctcttcctcatcgtcctcttcatcgtcctcgacttccagtggagacgaagaaggagggagctccgccaacagggtccccttcagctacctttctctctgtttggtagttattctcggcgtgtgtatagtatag PAS_chr3_1157 2atgatcatcaaccacttggtattgacagccctcagcattgcactagcaagtgcgcaactccaatcgcctttcaaggctaacaagttgccattcaaaaagtttatcattccaacgacccaaaggaccgtttaattaagagagatgactacgagtccctcgacttgagacacatcggagtcttgtacactgcagagatccaaattggatctgacgaaactgaaattgaggtcattgtcgacactggttctgccgacttgtgggtcatcgattccgacgctgccgtctgtgagttatcctacgatgagattgaggccaatagcttttcctcggcttctgccaaattcatggacaagatagctcctccatcacaagagctcctggatgggctgagtgagtttggatttgctctcgatggtgaaatttctcaatacctagccgataaatctggacgtgtttcgaaaagagaggaaaatcaacaagatttcaacattaaccgtgacgagcctgtgtgtgaacagtttggttccttcgattctagttcttccgacactttccaaagcaacaattcagcttttggtattgcttaccttgatggaaccactgctaacggaacttgggtcagggacacagtccgcatcggcgactttgccatcagccaacagagttttgccttagtcaacatcacagataactacatgggaatcttgggtctcggtcctgctacccaacaaaccaccaatagtaacccaattgcagcaaacagatttacttatgatggtgttgtggattcattgcggtcccaaggatttatcaattcagcatcgttttctgtttacttgtctccagatgaagataacgagcacgacgaattcagcgacggagaaattttatttggtgctattgatagggccaagatagacgggccatttagacttttcccatatgtcaatccttacaaaccagtttaccccgatcaatatacttcctacgttacagtgtccacaattgcggtgtcttcgtcagatgaaactctcattattgaaagacgtcctcgtttggcattaatcgatacaggtgccaccttctcctatttgccaacctacccattgattcgtttagcgttttccatccatggaggctttgaatatgtttctcaattgggactatttgtcattcgtacaagttctctgtctgttgctagaaataaggtgattgagttcaagtttggtgaagacgttgtgatccaatccccagtttctgatcatctattggacgtctcaggcctttttactgatggccaacaatactccgcattaactgtacgtgaaagtcttgacggactttccattctaggtgatacattcatcaaatcggcctacttattctttgacaatgaaaacagccagctgggtattggtcagatcaacgtcactgatgacgaggatattgaggtggtcggtgatttcactattgaacgagacccagcctactcctctacttggtctagcgatttacctcatgaaacacccactagggctttgagtactgcttcagggggaggccttggtaccggaataaacacggccacaagtcgtgcaagttctcgttccacatctggctctacttcacgaacttcttctacatctggctctgcttctggtacttcttcaggtgcatcttctgctactcaaaatgacgaaacatccactgatcttggagctccagctgcatctttaagtgcaacgccatgtctttttgccatcttgctgctcatgttgtag PAS_chr3_0299 3atgaaccctagcagcttaattctacttgcactcagcattggctactccattgctgagtcaaatttctctttcaaacccagcaagttacctctcaaaaaacatcgtgattcttcttccccgcatgaacgatttcttaaacgagatggaccctatcatccgctagaagccgacgcttacttttactacactacgtctatattggttggatcagaagaagaaaaagttgaagtaacagttgatttaggaacctctgatttatgggtcgtcgattacaacaccggtttatgtgatagatcctttgacgaaacctatcttaaacgtagtctggatacttctgaggaagattattctgctggagatcttggctcctcagtcggtgtacgcagcgctagaaaattcttgcgcaaaagggacaccaatcaaactgaggttaatgaagctaactatggtgcttgtccaaattcgattaccttcaatccagaaaactcgtcttctttccagagtaatgatactgctttcaatatcagctactttgatggaaccagtgctagtggtttttgggctactgatacaatttactttggtgaccttgaggtcagcgagcaattttttgggctggcaaacttaacaataagttatggaggagtcttaggtcttggcccttccaacctacaaacaaccaatgctaaccccaacggtgaggaattcatttacagcggagtcttagattccatgcgtgatcaagggcttatcaactcggcttctttctcaatctatctcaatccagagaatttcagagatgaagataactattctaatgaaggagcgattttgttcggagcaattgataatgcgaagattgacgggtcattgaagctgttaccatacgtgacttcaggtggacactctcagattgatgctaatttcacttacatcaccttgaataatattgccgtggctgacaatgatacagccctgatcgttgagaccaacccccaattggcaatgttgaatccaaagtttatatacacctattttccaaacgaagtattgacccggctggtaaactctattgacaatctagaatatgatcctgttgaggggttatataggataaggagaacaaacattagggatattaacaaaaaaatcatagagtttcaatttggtgacgagattgtgatacattctcccttatcaaattatctgtctgatacatgggttccaagcacaaactacacctatttggagattcaggatagcagagaggatttctttatccttggtaatgcatttttcaagtctgcgtatttgttttttgacaatgataacagtgaagtcggtattggccaactaaaggttaccgataaggaggacatcgttccagttggtgaattttctttggatcaagattcagggtactcgtcaacctggtcaacgttctcctatgaaactggttcagctcccttgggtacgtcaactttcgaaacgagtacaaaaactagttcagatggagctgccccgtcggtgtctcacattaacactagttcctacttatttgcgtttgtactacttttcctttag PAS_chr3_0303 4atgttgcccatccgcttatccaaacttctgcttttgctctccttaaagttgaaattgggtacagctgaagaaaaataccaaaagttggatttaaaaagaattgacaaagactattatgccgtcgatgtcaaagtcggctccgatgagcaggagatcaaagaggtactaatagatacgggttcatctgatttctggatcttggacaaatcgttctgtaattctccaacatcagaggaagaagagaacagtaacgggcgtagcaacaaggaaagctgtggagtctatggctcgttcgactccaacaagtcagagacatttcaggcaactggccaagtatttgacgctgcttacggtgacaccacagccgagtcgacaggatcttcaggagttcgaggaattgatcagctacgggtaggagatattcatatagaagaactctattttggactagtgacaaacactacaagtttaccacccgttttaggaattgcccagctttccgaagagttcagcaacaactcttatcctaactttccataccagatgaaagaggaaggtctgattgatgttgttgcatactctctctccttgggccaaagtaaaggtgaactactgttcggggctatggaccactcaaaatataatggaacactattgaaagcccctatattgcaggcgggcacaccaggaatgcaagttcttttaactggagtggcccttacaaatggttcatcaagcgtcttcaatgagacagacaataaaggttttatctactttgacagtgggactactgcttccactctgccatcagagcactttgatgatcttttcaaccatcacggatgggcgtacgatggtgatacattgacatattcgattcaatgcgatagtgagggagaaaaatctttacttgacttcactttagaatataccattgctggtaatattgtcatcaaagtaccatttgaagacattattatgaagaatgaaaatgatggagaatgcctctcaaccgtaatggtgtcgaaccagacttctttttcatattccgatgacacaccctttttcgttgctggagacgaagttctgttgaacgcttatgttgtttacaacctagaaacacaagagctggccattgctccagcagtggataatccagaagatactgaagaagatattgagattatctccgcagactttgatatttcagaagccagagattatagcgttggattagagttcagaaataccacaattccagctacaactgattacttgccttcctcgatgtcgtcaggttcagtcagcgaagagactggttccaagtctgagagctctacttctgaggactttgctgcagccacgttgaaaccatttacattttggggtttcgtcctttttttctttcactttttgatttga PAS_chr3_0866 5atgttagttgctgttgccctagtgttgttactgtctacaggctatgctggaatcgtcgccattgataccgaatatgagttcaccattggttttcttagtacgatagaaatagggtttcccccacaaagcataacggctcaatgggatacaggatcgtctgacctcttggtcaattccgtgacaaattcacagtgtgctcaggacggatgtagctttggtgcgttcgccttcaacaaatccaccacttattccaatataacaaaccctaacaaccttcatgttcagttctcctttgcaagcggcagcgtggttgatgacaaacttgtgagtgacactatttttgtagattccaaggtaatcccacggttcaactttgcactggtatcgaagggagacctgtatggtgataatatttttggtattggaccgagagggaaccagggaacattcgattccaatggaactccagctttctatgatagctttccttatcacttgaaggccctcggtttaatcaaacgactggcttactcattttacactgggcccacccagggaaaggtagtatttggaggggtggatcatggaaagtacgatgggtgcctggagaaactcgagattgtccatgacagtgctttttacacactgcttgaggcaattgatgctgatgatacttccgtcttggatgagcaaattcatgttttgtttgatactggtaccgccttgacactttttcccagctttattgctgaacaactggctgattttttgaaagctacatattcggacgaatacaatacgtttgtagttccctgcgaccaagattttgattttgaataccttcattttggttttcgaaacattaagttgtcggtgcgctttaaggatctgtttttagtcattgacgatagtgtttgtgctgtggggtttgatcaaggggcagatgcaaacaagataacctttgggtcttcacttttaagaaactactacacgctttatgatctagattccaaagaaattttgattgctgacgtcaagcctgatggtccagacgatattgaaatattatcgggtccagttcaacgaatttgtgatgaaaagggtgtcagtagcacttcattatggagtagtctgagtatagagtccacgatagaaccagacacttttaccactaagccttctatttcccagacacggtattcgactagctccattggacctcaaaacatttctaactctttaggtgaatatccttcagtttccgtcactctttctgaacaccataacactacttccatagcctcaaattcctcattagaagggaaaccagcaactccaactgttacagaccagtcgtaccagaataataagactacctctaccgtaattgctgtgaatttgattacccattcaaccactcattcaaccactcattcacccacctattcaaccactcattctagtaatggatcacgctcaactttagagtacacttcaaccaaggaatcctcggtgaaaatgccctgtgcgttgatcatctccgacacaattccgtacaatgcttccggtgggaatagtagttatggatcgttaatttcaacatctacggttaacaatgttgaagagaataattcaaacactgttagaccaagaaaaagacagaccttcgtttcgggaaccacttccacgatactactctattcctcaactacgacccaagcatatcagatgttgtcctcaacttcaatcccccgaccatccataaaagccagttcaaatgctggtagccgcaaaacttcaaagacattattaacatttatcatattgtatattttttagPAS_chr3_0394 6atgtaccaggcgttgttggttttgtctctgatatgcttttcgtcggctaattttgttaagctgcgaagcaacgctggtatgttttatgatactatggctggagttccacgttcagatgaagagttctggttgcgtttggatattaaccaaggtctctcttggactctggatagtagctactactcctgtaatggctcaaatgtttcgtcttccctgtgtttcaattctgctcaaaacgtttacgatgcttccaatagtccaactgcagatttcgttgatgtctacgcaaacacaactgtaaacaatacagatgaggcatcggccgagagagtaaatcttacaaacaacttatttgctgatggcgtttatatggaagacaatttttacgtcacattgaataatggagcaagaatgactgctacagatctgaaatttttgaatgcccacaatagtagcgccgctgtggggtctttggcgttggggagttacacctcacaggacgtgccaactttcttacaaagactccaaagcggtggtcttattgaatccaactcgttttcattggcattaaacgaaatcgattcttcatatggagagctctatttggggacaataaactctaccaagtatgtcgagcctctggtagaattcgattttattccggtgtcagatcccaatggagtttttggattcgattgggaagatacattccctacagttccgatcagcggattaagcatgtcttcgaatgacaaacagagaactgtctttttccccaatgagtggaacaacacggtcttaacgggaacatacccacttccaatgatgttagattcaagaaacatctttatccatcttccattctcttcaatcatacatatagcagtgcagcttaatgcactgtatcttgatacacttcataaatgggccgtgaactgttctgttggtcaactggacgcaactttaaactttcacatgggtaaccttaccgttcatgctcctatcaaggagttgatttatccagcataccaaggagacaaaaggctgagctttgctaatggagaagatgtttgtattcttgccatggctcctgatgtttacattggttatccactgctaggaaccccctttttaaggaatgcagtggttgccgttaatcatgattcaaaaaaggtcgccgttgccaatcttaatagagatagcattcctcccgcttcgaacgtttctgtttcggaatcaatgggagtttatgttcctccacctgtttcaacttcaagaacatcggagagaccgtccacactagatgagactagtacagccaattttgacaaaagggaagagtctgcaatatcatcaagttcagtcactaacagctcgtctagaaattcttcaaccataacttcttcaggaactcaaaccgagcaaacatcaggcatagctaccatcgaaacagatagcataccaggagctctagggaataatttaactgattattcaacgctgactctaacaatatacaccaattccgaagtggacgaactcaatcctaacatagcaacagcattcatttccaatggttctatttattcagagccttaccccttttccggaactgcagttgctgaatcattcagtgcatcaccttcacaggctgaaggatcgaactcatcgtcctcaggatcttctttagttttgtgtttctttacatcattggccagtctgttgactgtgagctgtctactactgtaa PAS_chr1- 7atgtttgtgatccagctggcattcctatgtctaggcgtcagcctaaccactgcacaacctagttcacctttcaaggcaaataagtttccttt1_0379taaaaaggttcactactcatcaaaccctagcgatcgccttattaagcgagacaactataagaagcttgacttgagacatcttggcgtcttgtatactgcggaaattgaaattggttcaggcaaaactgaaatcgaagttattgttgacaccggatctgcagatttgtgggtaattgactcaaatgcagccgtatgcgattgtcctatcttgagatacaaggtacaagtgtttccacccttagtcaaactgccaacgtaacacccctatcaggtaaacttttgaatggacttcaagaaattggcattgtaactgatggcaaaatttccaaaaagtttcaggaaaaccatcttttgaagagaaacgaggccttgaattttgatgtcgatctgaataagcccatttgtgatcaatttggatccttcaatccacagtcatcaagaacttttcaaagcaacgacacagcatttagtatcagatatctggacaactcttttgccaatggatcgtgggtgagggatacggtttatgttggtgattttgaaattgaccagcaaagttttgcattggttgatatcacaaataactacatgggaattctgggccttggtccttctagtcagcagacaaccaatagtgatcctacagataacagtttcacttatcttggtattctggattctttgcgggcccaaggattcattaattcagcctcgtactcggtttatctggccccagatggtaagactgatgatactgatcacgatgatggtgagatcctgtttggtgctatcgacgaggctaaaattaatggacagttgaagttgtttccatatgtcaatccttataaatcggtataccctgaccaatacgcttcatacatcaccgtttccagtattactgtagccagttattttagtagccgcttggttgaaagaatccctcaattagctcttttagacactggtgccacattttcttacttgccaacttatacgctgatacgtctcgcctatgccatccatcctggttttgagtatgtccgacaactgggtttatttattatagagtcaaacgtactctccagtgcgagacaaagtaccattgacttccggtttggcaaagacgtagtaattcgatccaatgtttcagaccatctactcgacgtatcacaatacttcacatctggacattatcttgcacttaccatccatgaaagtgtcgatgggcttctcattttgggtgacacgtttatcaagtccacctacttatttttcgacaatgataacagtgaattgggtattggtcagatcaaaattaccaatgacgaggatattcaagaagttggtgaattcaccttagaacgcgattcagactattcttctacatggtccatttactcttatgaaacttctttggatcccttaagcactggcactggtacggggtcaacctattctcctactcgcagtactacagctagaagcgaaccgactacgtctcgacgctccaccacccttcaacccagaacaactgtgattccttctattgacaggctttcattgaacagcataactagtcatggttcctctactaacggaacctccccaactaatgagacttcttttgctgaggatggaggaactttgacacccgaagaagcttctttgacaacttcactaaattctgctactatttctgagactacttttgtcgatgttgaaacttctactaccaatggtgcttcagttgtatctttgagtgttggtccctgcattattgccttcctactactcatctcttaa PAS chr1-1 8atgagcatgggagctactgtttcaaaggagtccactgtagacctaacactgccgctgttgcagctgagtccaagactgttgttcctgcctgg0174agttgtctacaagacgactttcaagttccaggagggggtcaacatcttgctacgttttagagacctgttcgatgagtctttttctgaaagaaatgacgttctaggtgatattgcccgctcgcagaaggaacaacaggaaaacgattatgaccatatcccttttttgagcagcaatgctaagaagagcataggtgtcctgaaagaccaacttgaacttggtgggtctgatgacaagtcacttccctgggttattgcctgtctccctgggttcgaccagtcagaccaggactccattgccactacaatttgtcagataactgaggtgtccgtcgttaaccaggatattgtactatccttcgaagcattaaccagaggatctttaaaatccaaaaagaccatctccatgaatgaatcaaccatatctgtggaagtggatataccatttactgaggttgaccagaccatcagtaacaagctcatcttgacaaatattgataagggtctgcaactactggagaatatcaaacagtttctagtcacctatcaaaatgacatgatgaaccttgaagatactaccatggaaaagaactcccgtctaaagtctgcaatgatgattttggctccgttgtctcacttgatctacgccactgtctcatctcaagaatccactcatgcttatactagactatccaaccagtacaagtccgctaagaaggaattagattcaaccaaaaacagaaagtctttactcaagaagattttgaaaactaatgatattctcacttcagtgttccccttcagtatggttcaaaaggtggatgtcttgggagctatttcaagttctacagacaggatccaaacaactatcgacgcgttggactttgccaatccacttttcgaaacatatttgaacgttgattatgttctggagacatggaaagattttgacactaagaacggcaaaattgctgccaatttgaccaggtctcaattagtatctaaccacttgaagggcctcagagtactgattgaagacatccaaggaacttcaagaaggcgggtcagtccttctcagagaactcgtttggcgccttcgcccaatacaaattctgcaaatcaggcaccgaaagctggagaatcagacgacgaaaataaagaattgcgtgattttatcaacaacctctccaaattgaagatctcagaggatggaaagaggctcgttaccaaagatttcaacagaatgactcaaatgcaaccaagttcatcggagtaccaactgctcagaacttatttagagattattatggatatcccatgggaaacaaaaaatattgtaaaacaacaaatttttgatctagacaaggccaaagaaacactagatcaggaccattacggaatggactccgtcaaagataggatcttagagtatttagcagttcttaaactccacgatcacattaaaacgtccaaccccaagcaagaagacgaggaaatcaaagccagagcacccattctcttactaacaggtccacctggtgttggtaaaacttcgttaggaaaatctattgcaaaggctctgaacaaaaagttccagcgagtaagtcttggaggattgaaggatgagtccgaaattaagggacatcgcagaacttacgttggagcaatgccaggactattgacccaagcactgaggaaatctcaatcttttgatccagtgatacttttggatgaaattgacaaggttgtcgatggatcccaaggccctggtagtcgtgtaaacggtgatccagctgctgctttgcttgaagtgttagacccagagcaaaattctaacttctctgaccattatatcgggttcccacttgacttgtctcgtgttgtttttatctgtacgtccaacgatatgagcatgatcagtgccccattaagggatagaatggaggttattgaactgaatggctacaattatttcgaaaaagtggagattgttaaacaattcttattaccaaagcagatcaaaagaaacggactgcctacgaatgccgaatcaccatcggtggttattcctgacgaagtgattatgtacatcgctgtcaattatactcgggagccaggtattcgtaatttggaacggttaatagggagtatctgtcggggtaaggctattgaatactctagcttgatgagtagtactcaagctccaggcgaaattccaaagggatacgtttccaaggtcacggtagataatctttcaaagtacattggaatacccccggaattgtctacaggcaagaatatgaggaatgattcagctatctctaaaaagtacggaatcgtgaacggcctcagttacaatagtagcggacatggaagtaccctagtctttgaaatgaccggtatacctaatagtactaacactaacatgattacgaccggcagattgggtgatgttcttacagaaagtgtcaagatcgcaagaacaattataagatcgatgtttagtcacaacttactacaattaaaggatgacgaaacttcaacttctggggatcttttgaagaggtttgacactactcaggttcacatgcatgtgcccgctggtgctattcaaaaagacggacccagtgctggaatcaccattacgctgtgccttctgtcggtgatgctagagaaacctgtaccaagggatttggccatgactggagagattactttgagagggatggtactgccaattggaggtgttcatgagaagctactaggagcacatttaactggaaccgttaaaagggtgatccttccaagaagtaatcgaagagatgtcattcaagactttatctctaacttggaagccaataacagaagttctagggataagctactggtagatcttatcaaagaggaggagtcattactgtccaactcaaataaatccgaacgaattggagtgttcgggcttcctgaaaaatgggttcaagagaagttgggacttcaagtgagctacgtggaagaattttgggatgttatccagattgtctggaacgatcaggttgaaattgacagcaccaaattacacgagctagctactaaagagttcgcaaggctatgaPAS chr1-1 9atgcaattgcgtcattccgttggattggctatcttatctgccatagcagtccaaggattgctaattcctaacattgagtcattacccagcca0226gtttggtgctaatggtgacagtgaacaaggtgtattagcccaccatggtaaacatcctaaagttgatatggctcaccatggaaagcatcctaaaatcgctaaggattccaagggacaccctaagctttgccctgaagctttgaagaagatgaaagaaggccacccttcggctccagtcattactacccattccgcttctaaaaacttaatcccttactcttatattatagtcttcaagaagggtgtcacttcagaggatatcgacttccaccgtgaccttatctccactcttcatgaagagtctgtgagcaaattaagagagtcagatccaaatcactcatttttcgtttctaatgagaatggcgaaacaggttacaccggtgacttctccgttggtgacttgctcaagggttacaccggatacttcacggatgacactttagagcttatcagtaagcatccagcagttgctttcattgaaagggattcgagagtatttgccaccgattttgaaactcaaaacggtgctccttggggtttggccagagtctctcacagaaagcctctttccctaggcagcttcaacaagtacttatatgatggagctggtggtgaaggtgttacttcctatgttatcgatacaggtatccacgtcactcacaaagaattccagggtagagcatcttggggtaagaccattccagctggagacgttgatgacgatggaaacggtcacggaactcactgtgctggtaccattgcttctgaaagctacggtgttgccaagaaggctaatgttgttgccatcaaggtcttgagatctaatggttctggttcgatgtcagatgttctgaagggtgttgagtatgccacccaatcccacttggatgctgttaaaaagggcaacaagaaatttaagggctctaccgctaacatgtcactgggtggtggtaaatctcctgctttggaccttgcagtcaatgctgctgttaagaatggtattcactttgccgttgcagcaggtaacgaaaaccaagatgcttgtaacacctcgccagcagctgctgagaatgccatcaccgtcggtgcatcaaccttatcagacgctagagcttacttttctaactacggtaaatgtgttgacattttcgctccaggtttaaacattctttctacctacactggttcggatgacgcaactgctaccttgtctggtacttcaatggcctctcctcacattgctggtctgttgacttacttcctatcattgcagcctgctgctggatctctgtactctaacggaggatctgagggtgtcacacctgctcaattgaaaaagaacctcctcaagtatgcatctgtcggagtattagaggatgttccagaagacactccaaacctcttggtttacaatggtggtggacaaaacctttcttctttctggggaaaggagacagaagacaatgttgcttcctccgacgatactggtgagtttcactcttttgtgaacaagcttgaatcagctgttgaaaacttggcccaagagtttgcacattcagtgaaggagctggcttctgaacttatttag PAS_chr3_1087 10atgatatttgacggtactacgatgtcaattgccattggtttgctctctactctaggtattggtgctgaagccaaagttcattctgctaagatacacaagcatccagtctcagaaactttaaaagaggccaattttgggcagtatgtctctgctctggaacataaatatgtttctctgttcaacgaacaaaatgctttgtccaagtcgaattttatgtctcagcaagatggttttgccgttgaagcttcgcatgatgctccacttacaaactatcttaacgctcagtattttactgaggtatcattaggtacccctccacaatcgttcaaggtgattcttgacacaggatcctccaatttatgggttcctagcaaagattgtggatcattagcttgcttcttgcatgctaagtatgaccatgatgagtcttctacttataagaagaatggtagtagctttgaaattaggtatggatccggttccatggaagggtatgtttctcaggatgtgttgcaaattggggatttgaccattcccaaagttgattttgctgaggccacatcggagccggggttggccttcgcttttggcaaatttgacggaattttggggcttgcttatgattcaatatcagtaaataagattgttcctccaatttacaaggctttggaattagatctccttgacgaaccaaaatttgccttctacttgggggatacggacaaagatgaatccgatggcggtttggccacatttggtggtgtggacaaatctaagtatgaaggaaagatcacctggttgcctgtcagaagaaaggcttactgggaggtctcttttgatggtgtaggtttgggatccgaatatgctgaattgcaaaaaactggtgcagccatcgacactggaacctcattgattgctttgcccagtggcctagctgaaattctcaatgcagaaattggtgctaccaagggttggtctggtcaatacgctgtggactgtgacactagagactctttgccagacttaactttaaccttcgccggttacaactttaccattactccatatgactatactttggaggtttctgggtcatgtattagtgctttcacccccatggactttcctgaaccaataggtcctttggcaatcattggtgactcgttcttgagaaaatattactcagtttatgacctaggcaaagatgcagtaggtttagccaagtctatttag PAS_chr3_0076 11atgaagctctccaccaatttgattctagctattgcagcagcttccgccgttgtctcagctgctccagttgctccagccgaagaggcagcaaaccacttgcacaagcgtgcttactacaccgacacaaccaagactcacactttcactgaggttgttactgtctaccgaactttgaaaccgggcgaaagtatcccaactgactctccaagccacggtggtaaaagtactaaaaagggtaagggtagtaccactcactctggtgctccaggagctacctctggtgctccaactgacgacaccacttcgactagtggctcagtagggttaccaactagcgcaacttcagttacctcttctacctcctctgcaagtacaacaagcagtggaacttcagccactagcactggtaccggtactagcactagcactagcactggtactggtactggtactacaggcacaggaaccactagttccagcactagctcttctgctacttcgactccaaccggttctatcgacgctatcagccagacacttctggatactcacaatgataagcgtgctttgcacggcgtcccagaccttacttggtctaccgaactcgctgactacgcccaaggttacgccgattcatacacttgtggctcttcattagaacacacaggtggaccatacggtgaaaatttggcctctggatactctcctgctggcagtgtagaagcatggtacaacgagatcagcgactacgatttctctaacccaggttattctgctggtaccggtcacttcacccaagttgtctggaaatcaactacacagctgggctgtggatacaaggagtgcagtaccgacagatactacatcatctgcgaatacgcacctcgtggaaatattgtttctgccggctacttcgaagacaacgtcctgcctcctgtttga PAS_chr3_0691 12Atgactgtgcaaattttgattgtagttaccagtgttgctaagtatgaaagcggaaagctgccaacaggcttgtggttaagtgagttgacacatatgtatcatagtgcaaaagagaacggctatgatgtgacgattgcgagtccgcaaggcggaaacattccgcttgaccctgaaagcttgaaatcaatgctgattgacaagctttcaaaggattatgagacaaaccaagactttatgaagttgttgcaaaacacaaaaagtttgggtgaagtcacaggacaacagtttgacgttgtttatttggcaggtggacacggaacaatgtatgactttccgaacaacactgttttacaaaacatcatcaaagaacactatgaggcgggcaaaattgttgccgctgtatgtcacggagtttgtgggcttttgaacgtaaaactgtctgatggcgagtatctaatcaaagacaaggccattacaggatttaattggtttgaagaagctatagcaggacgcagaaaagaagtaccgttcaaccttgaagcagaattgaataaaaaaacttcaaaatacgagaaagcttttatcccaatgacgtcaaaagtggtcgtggacgggaacttaatcacaggacagaacccattcagttcaaaagaaattgcgaaagtggtaatggaacaactgaagcaataa PAS_chr3_0815 13atgattgatgagaagcaattgaatcaacccaaaaggagcgtcttaagacgtctccatatgctgtttctgccattactagctatctcctttttcctgatatatttaagtgatatcacacagcctctcttccgtgcccgaaaggaagacgaaaacccgttggaaatttacttgaaggcattggaaacgaatgaagctcacaaatggtcaaaggtgtacacttcgcagcctcatttggccggaaccaactacggattggttgagtttactaagtccaaatttgaagaatatggatttgaggccagtgtcgatgactacgatgtgtacctgagttaccctattgatcatagtttggaattgtatgagcattctgaggataaaaatgacaagctcttgtataaggcttcgctgcaagaggacgttctctctgaagacccaactacttcaggcgacgacctgatccctaccttccttggttacggtgctaacggcaatgtatctgcagaatacatctacgctaactatggaaccaaagaggactttgaggatttggtggcccgtggtgttccaatcaaggggaagatcgcagtcattagatatggtcaaatatttagaggcttaaaggtgaaatttgcccaagaatatggcgcaatcggtgctgtcatatacagtgacccaggcgacgattatggtatcacccctgaaaatggttacaagccttaccctcatggtaaagccagaaacccaagctctgtgcaaagaggttctgcccaatttttgtctgtttatcccggtgacccaaccacgccaggagttggatcgaagaagggagtagaaagagttgatcctcatgctacaaccccttccattccagtcttgcctttgagtttcaaagatgccttgccaattttgaagaaacttaataaggaaggattgtctgttcctgactcctggaagggaggtctcgagggagttgattacagtaccggcccagctaaaaacattcatttgaacctttatagcgaacaaaactttactattacacctatttacaatgtctatggagagatcaaaggtgagaatgctgacgaagttatcattattggtaaccatcgtgacgcttggattaagggaggtgcttctgaccctaacagtggatctgctgctttgattgaacttagtagaggtttgcacgccctaaccaaaacaggatggaagccacaccgtactattgtactagcttcctgggatgctgaggaatatggcttgattggatctactgagtttggagaacagtttgagaagttccttcagaagaaggtcgttgcctatttgaacgttgacgttgctgtagctggaactcatcttcatttgggtgcctcgccatctttgttcaaactattgaaggataatgccaaagaaatcactttcaagaattcaaccgagactttgtatgacaactatgttaaagatcatggcaacgacattatttcgaccttaggaagtggaagtgactacactgtctttttggatcatttgggaattccttcgcttgatattggtttcattgctggaaaaggtgacccagtatatcactatcattcaaactatgattcgtaccactggatcagtactagtggtgatcctggatttgagtatcataatgtactggccaaatatttgggttcgttggttttgaatctctctgagagagaggtgttgtacctgaagcttcatgattatgctaccgaattgctcaagtacctcttggaagcctacgcccaaatgccagaggaatgggacgatgaagtaattggtttcagatcttcctcgtgtcatcgtgcgaaagcatctcatcatggtaaggatcctcatcatgagggaagacgccatcacggaaaaggattccattctaaaggagggcctcatcatggggaacgccatcacggaaaaggattccacgctgaagggggaccccaccatgagaaaggaccgcatcacgaaaaagggctccacgtcgaaggagagccccatcatcagaaaggacctcactttgaaaaaggattccatcatgacatggagatgtaccataagaaattggctcatcacggtaaagaacccaagacgaagctaaagcacttgaagaaacaagttgagagtttaatcatcgatttcgccaataccactcaaacatatgacgcttacactgacttccttcagaagcaacatgagattagggattctctttcattctgggagaaaatcaagctacattttaagatcaaggcagctaacttcaaacttaaatattttgagcgagttttccttcatgaaaatggcttaaagaacagagaatggttcaaacatattgtatatgctgcaggaaggaacactggttacgccggacaaagactgcctggtcttgtggaagccattgaagacaagaatctgcatgatgcagtaaaatggcttcacatcctttccaagaagattgatagtctacagaagtcattagagtag PAS_chr1- 14atgagattacttcacatttcattgctatcaattatctcagtattgaccaaggccaacgctgaatgttgttacaccaacacacatactaccac4_0164tgaagtctggtatactacagtatatgctcgagatgttagtgaagagacttcttccacactggctggtggaagtgcaactgtcagctcagaagtgagttcgacaattgaatctagcgttgccacttccgctaccaccgaatcttcaagtgagacatcagggtccacatctgggtccacatctgccactgaatcatcaactggtagtagctcgctagcaaccagttcatcgataaccagttcagagtcttccaccattacacaaaccacaggacaagagtcaacaagcccaaccccatcgtcctcagagacaggttcttctactactactccctacgatataagtccaacggcaagttccgactttgatgcttttaaatatcaaattcttgatgaacacaacataaaaagagctctacatggagttgacggattagagtgggatgaagaagtatatgctgccgcccaagcatatgctgacgcatacacttgtgacggaaccttggttcactctggaaatagtctgtacggagaaaacttagcgtatggttactcaaccagagggactgttgatgcctggtacagtgaaattgaatattatgactttaataacccaggttataccccaggtgttggacatttcactcaagtagtttggaaaagcaccacaaagctcggctgcgctttcaagtactgcaatgactattacggagcctacgtggtatgcaactactcaccaccaggaaattatgtcaacgagggatacttcgaagccaatgtgttaccactggtagattaaPAS_chr3_0979 15atgagttatcccctaggtctgggtcgtacagcttataggttcatcccgaggtcaatctgttcaagacgatccatctcatcccatgcattacctccaacgccctccaactcaccaccagcaggagatttattcaccaaactgctgaacgaacgcatcatatatttagcaggaggcattgatgatgcgcaagcaacatctatcacggctcaattgctgtatctggaatcgcagtcaacgtcgaaacaaatcaacatttacatcaactcaccaggaggttctgtcacggcagggctggccatctacgacacaatccagtatatccgagcgccagtttccacggtttgcttaggacaggcatgctccatggcatccctcttgcttgcaagcggaacgcatggcaaacgtttgatcttgccaaacgctaccataatggtgcatcaaccatcttcggcaaacggaattaagggacaggccactgatatcgagatatatgcccgtcatatcatcaataccaaacagaaattgcaaactttatacctaaaacacatgtctccaaccatgacggtggatgaaatcactgcacttttggagagagatcggttcatggagccagaggaggcagtgtctcttggactggcggaccgtgtattagagaggaaacccccggttgtatctgactaa PAS_chr3_0803 16atgacagataccaaggagttagccacgttgctggagaacttgttgaaattgcaaaaatcaggaagtcttggtgaaattgtgggtcaagcacagcgcatttatcatgacatttctgacctctcagtcctatctggattatcaaccccagaagtgctctctcctcacacatctccagatgtccccgagagagttccatctgaagtcaacttagacaattccaatctggcaactgatgtcaacgaaaaggagaagtattttgacgattttgcaaatgactacatcgagtttacctacaagaaccccaccacctaccatttggtgcaatctgtggcggaattgttgaagaaaagcggattcgaatatcttcctgaagcagctgactggtccaaattattcgaccctgaaaagacgggagcgtatttcacaatccggaatggaacctctttagctgccttcacaattggtagtttctggtccccagccaagggagtaggagctatcggaagtcacatcgatgctctcacaactaagctgaagccagtctccaataagagtaaggttgatggctacgagttgttgggagtttccccctatgctggtgctttgtctgacgtctggtgggatagagatttgggtattggtggaagagtaatttacaaaaatgaatcttccggcaagctttccaccactttggttaacagtacacctcatcctgttgctcatattccaactttggcccctcattttggtactccctccaacggtccattcaacaaggaaacccaagcagttcccgttgtaggattttctgacggaaacgacgaggagaaacccactgaggatgaacaaaagtctcctttgattggtaagcattctttaaaactactccgctacatatctaagctagcaggagtgccagtgtcctccttgattgatttcgatttggacatattcgatgtccaaaaaggtactaggggcggtctttccaatgagttcatttacgccccaagagtggatgatcgtatttgttcttactctgctctacaagcgcttatcagacgtcacaaggatcccgaatcctttgtcacagacgactctttcaatcttgttgccctttatgacaacgaggagatcggatctctctccagacagggagccaagggtggtctacttgagtcgaccatttccagagcaatcgctgcattgaaaatttcagagccagggactctgcaaagactatatgcaaattcagtgattctttctgcagatgtcacacatttgttaaatcccaatttcaccgaagtgtacttggagcaccacaagccactgccaaacacagggattgcacttgcgctggattcgaatggccatatggccacagatttgttaggcaaggtcgttgttgagcagctggctaaactcaatgatgataaagtgcagtacttccagattcggaacgattcaaggtctggagggaccattggacccagtatttccagtagtactggcgctagaaccattgatcttggaattccccaattgtccatgcacagtattcgtgctaccgtgggatacaaagatgttggcctcgctgtcaagtttttccaagggttctttaaaaattggagaaaagttgtcgacggcattgaagagttttaa PAS_chr2- 17atgacttcggtatttttgggtgtttatagagccctatttgattaccaagctcaaaatgacgaagaactaactgtgcatgagaatgatctact1_0366atacgtattggaaaagtccgaaattgatgactggtggaaagttaaacaacgagttatcggagttaatgtcgaggaaccaataggtctggtacccagtacttatattgagcctgctacacctatcgggtcagctgttgcactgtatgattatgacagacaaacagaagaagaaattactttcaaggagaatgacacctttgacgtgtacgacaccgacgatcaggagtggatcttggttggcctgaacaatatccattttggtttcgtgcctgcaaactacatacaaatttctttgggtacgacggcacctgcttctaacaatccaccaatacttagtcccgccagcttccctccacctcctcaacggatcaacaactcctctgttccctctctcaaagatgctgaaccagcaagaaatctagaggacgataatgcttatgaagaggaggaagatgtacctccaccaatgccaacgcgaccaactgccactacagctacatctaatatctctgctcctcaggactctgaatccgaagaggaaccttctagtagtagcagaaggccaagtggccgttcaagggcggatgatgattttgtaaaaggagactatttcacttgggatgttcaggaaattaatggccgcaaaaagaggaaagctgtcctgggtatcggaaatggtagtatttatgtccaagcagagggacattcttctaagaaatgggatatcaggaatttgacaaatttcagtaacgaaaaaaagcacgtcttttttgactttaccaacccctcggcatcctatgaacttcatgcaggctccaaggacgcagcagatgccatcctgtcaattgttggtgatttgaaaggtgcttcttcaatgcgtgctttgaaagaggtgaaggctgcatcttctgccccaaaaaccaagactggtaaagtcagttacaacttcgatgctgaaagtcccgatgagttgtcgattagggagggtgatgttgtctacatattgaacgataaagaatcctctgagtggtggatagttcaggacgttaatactaacaagaaaggtgttgttccagctagctacatagagttgattagcgggggtggatctactttagccagcattggctcttctatttccaaaggttctaagaaagcttttggatcctccagaaaacgtaaggaaaaagagcgtaagcatttggaagagcaacgtgccgctaaaagagaaaccgaaagggaacgtcaaagacttcgatccaaggaagaaagggataggctaagaaagttagatgaaaaggaaagaaggaaaaagcaaaaagctactccacaggatgaagaccaacccgagactagcaaacctaatcctcatagagtgcgtacctggattgacagttcaggatccttcaaagttgaagcagagtatttgggagttgttgacggtaagattcatctgcataaaacaaacggtgtaaagattgccgtagcggctcctaagttgtcactagaggatttagagtatgtggaaagaatcactggaatgtcgttagaaaaatacaagccaaagccaaaatctagtggttcctattccagaccttccaaaaagccatcctctagagaatcttcaccaaaggagtccagccgctccggagttaaacaatcagttcccaagattgatcctcccaaagacccagattatgattggtttcaatttttcttgggttgcgatattgatccgaataattgtcagcgatacagtgtggttttcattaatgaacaactggatgagagtagtttgcaagacctcactccatccctactaagatcgctagggttaagagaaggtgatattttgagagttcaaaaattcttggataacaagtttggtcgaaccaaagctcaagaatctgctaccaatggtggtttatttaccaagagtgatggtacattgaagaacaataggtccactgatgttctaacaagtacagttgtaacgcgagaaactttaagtcctactaaggccgaggctaagagcaaaagaattgatgacgaagcatgggctctcaaacccgctgccgaatctagctctcaaatggatcaattctccagacctgtcagtgcaatgagcaaacaattgactggatccatacaagatctcgtcaacttgaaacctttgggggacaatgcaaacaacgcttcggtagcccacaaagctgaaacaccaaacactacccaggacaaaccttctgctcctgtcttggaacctgtgaagactggagctgcaaggggacctgtgcaagcgcaaccaacaagtggtggtttcgtcactgcacaacctactggtgctctagttgcaatgcctacaggtttcatgcccattacgatggtgcccgtaaagacaggaggaactatagctcttcaacccactggtggattcgtttcgttgcaaagaactggtggggtacttccgcaggttacagggggacttgttcccgttcagactggtgggttagtaatgcctcagacctcatttggtgtaactccaactttgcagccaacaggagggattctacctgctcagaggacaggtggattggttcctgttcaaaggacgggggggctaattcccgtccaacaaactggaagattagttcctgttcaacaaactggaggattgattcctgttcaaaggactggaggattagttcccgttcagagaactggaaacttacaacctgtacctacaacctcttttggaagtcaaccaacaggaacttttgtgcctcaatcttcctttggtaatcagttggccaccaatttgaataacccgcaaaccacattcggctctcaaccaacaggaggtttccctcagacatcatttgcacaaaatcagtttagacaatcgacaggaggtttccagcagaccccaattgtgcaacaaacagggggattcccccaatactccgctggacaacagacggtaggattccctcagaactcttttggacagcagacaggaggaattgcccaaaactcatttggacaacagacaggaggttatcaaacaggttttcaaggaaatggatcgattccaatgccccagtcctcattcggtgcttcaaatctgggattcaatggtgctacgcagcagaactacaacattggcatgggccaatctttgccagcagcttctatccctccccttcaaccctcttacacctcatcactcaatggaatgtcaaacatgcttcagaacgtaagcatctctcagcagccacaacaagcccagccaatgacgacttttggagcacctgtggcccagcctccgttacaggctcaaccaactggctttggttttggtaactcgccctatggaggtcagaacccactccaatctcagccaacaggtaaaagagccaacttatcagcagctaccgcagacaacccattcggcttctag PAS_chr3_084218atgaccaaccaatcaacagtggtggatttacgcctttcatccaagagagttgttggcaaaccagtcaagttgcccacagtcctagcgtgctcagggtcagattcttccggtggtgcagggatcgaagcagatatcaaatccatcacggcttttgggtgctatgcgctaacagcaattacatctttaactgcccagaataccaaaggtgtcaccagtatagaaaacaccgacccaaagtttttcgaagagattttagaggcaaattttgaggacattgaaatcgatgtggtgaaaactggactgttaaaccctgagtcatctcgtttattgctgaaatttttagataaataccacaaaggaaagccatttgtcctggatccggtcttagtggctacgtctggttcaatgcttgcagatcaacacgaattagggttcaccattgattctcattttaagaaagctactatcattactccaaatttcgaagaggcatgtgtgatctactcttacttgaaaaagctgaagactgtagatgagttgggtgaaatagaaactttagaggatttgaaaggaatggccaagttcatccagcaaactacacattgcaactctgttcttcttaaaggtggccatattccctggaatagaaacgagcagttggttaaaaaaaagggaggagatccagcatacattactgatattctttatcagggtcatttggataaattcacggtaatcaagacagattacttgacaagttctggaactcatggttctgggtgtacgattgctgcctcaattgctgcaaacattgcccgttcgttgaagattgaggatgctgtaatttcttcgattagatacgttcatcaggcaatttttggagcagatgagacgctaggacaaggaaaaggccctttgaatcatgtgtttcatatttctcctcccattaacggcacaagtgctgagaataactttcttccgttctatccaggtcacttcttagattacttactggagcatcctttggtgagtcccatctggaagaactacatcaaccacccatttttagaaaacgtagcaacaaataagctggctaagaacagattcatccactacatttgtcaagattacgtgtatctagcttcttatgcccgtgtccacggcttagctgccggagttgcacctgatattgaaagcataaaggcagaagcccatataatcgactccatcatggaagaaatgcatagacataaagacgtattgaactctcgtggaattgtgaaactggatgaattaagaccctccaaggcctgcaaacagtattccgactacctcctaaacattgcgaagacatcagactgggtggccataaaaatcgccttagcaccatgcatctttggctactattacgctgccatttatgctcggtcgtttatcaaggatgaagctgacgtggacgaagaattcttgaattggatcaatacgtataccggtgattggtacaaagatgctgttgacgaggccagacagtcgctagaaagccatatgcaagctgtttctcccgtccagttagcagagctagtcaagatctttgcagatgtctgtcaattggaggtgaacttctggacttcgccaatggaactaccagaacaagatctatga PAS_chr1- 19atgcctacagtggtgactaacgagtcctctctcttgcaaacaaccgtgagtgttgcaccattggtgcttttatctgttgttgatcactacga3_0195acgagtggtgcaggcacccaacgccccaactaattcaaacgacaaaagagtcgtgggggtcattttgggagacaatacaaacaagaacttgatcaaggtaaccaactcatttgccatcccgtttgaagaagacgaaaagaacagggatatttggtttttggatcacgacttcatcgaatcgatgatggaaatgttcaagaagattaatgccaaagaaagacttattggatggtaccactctggaccaaagttaaagtcatctgatctacaaatcaacgagttattcaagagattcactccaaatcctttgcttttgattgtggatgtaaattccaccgatatagtcgatattcctacagactcatatttggcaattgaagaaattagagacgatggctcaagtgcagaaaaaacgtttatccatttaccatccatcatccaggccgaagaagcagaagaaattggagtggagcatcttctgagggatatccgagaccaggcgtgcggaaatctgtccataagattgactaacaatttcaaatcgctgaagtctttaaacgatcgcatagccaacattgtccaatatttgcgcaagattttaagtggagaattaccaataaataatgtaattcttggaaaattacaggacatattcaacttattgcccaacttggttgccgttcaaggtgatcccacaaaaccagccactgcaagtgctaaccaactagccacatcattcaatgtgaagaccaatgatgaattaatgatggtttacatctccagtttagtaagatccatcttggctttccatgatttgatcgacaataagatcgagaacaagaagaacaacgagaaagataaggaattcacaccaacagaggaagaaccccaacaagcggctatagaatcgaaataaPAS_chr1- 20atgacaatgtcaaccgaagatatcatcgccaggcataggaaggagaaaagggaccaaattgcacttattacaaggatgaagaagcagagcac4_0052taagtcaaccaaaaaggaaatcatgaaacaatgctctctcttggaagaagagctacaggcaagacataagaaggagttaggtgagtgcaagactgaaaattccgtcgagagaagtagtgagcctactgacgaaaaatcaaatggtggagaacttttttcccctgaaaagttattatcaatgatgactttaaaacagcaaggaactccaagtgagaatcaaggaaacgcaactgttccaaagagaaaacgcaataggcagaaggacagattagctagaagggaagttgccattaaagagatgcaagcagcagcagcaaaagaggctaacctccaaacaaatttcaaagagatagaattgaacaacataagccaactgtgccaagttgctcacctggaaccatatgatatccgacctgatgggcattgcttgtttgcatctataaaagatcagttggaggttcggcacaaaattgaaaatataagtatacaagatcttcggtctctggctgcgagtcatattaaaaatgatcccgagacttatactcctttcctttttgatgagaatactatgaaaatcagggacattgatgactatgcaaacgagctggaaaccacggctttatggggaggtgatatggaaattttggcattgagcaaagagtttgattgtccaatcagtgtaatgattagtggaagacctattcatcttgtcaatgccgacggttctaaagaggagttgaagttggtttattaccgtcatgcatatggcctaggtgagcattacaactctttaagagatagatcagagataagggagtcttgtatagttgagcaagaggaaaaagaagcggtagacgatggaaaatcatcttcttga PAS_chr2- 21atgagacttaagatcaagcgttcaaatgaacagcggctaataacattgcctgacggggctacagtatcggatttacttaatgaaattggatc2_0057agcttctatcaatataaaggttgggtttcctcctcagacaattgatatctcagataccagcaagttgcttactgatagtggaatcaagaatggtgaaatgatcattgtcactgataccattgaaacagaagtgcctgtcaacaagaatgaggttgcaattgccactgtctcaaaccagaatgatgcgccctacgttcaaatagacgacatcttcctagtcttgcggaagattcccgatgataattcttgtttcttcaactctgtcggctactgtatatttggtcctgattcaatcaagtatccggattctcaacaagaactaagacaggccgtcgctaatgtaatcagagagaacaaccaaggtatttataactccgccatcttgggtggaaagtcaatcacagagtattctcagtggatccaaagcagtaattcctggggaggagccatcgaagcacagatattggcagaataccttgatatcagtatctggacagtggatattgagtctcttcaagtctacaaatttaatgatgaaatggcttcaaggttttgcgttattatgtatagtggtattcattacgacgctatggctctcaagctggacacatcattagatgaggaggactcacaaatttgtgtgtttgataagttcagtgagttggggactttgattgaagacaacgttctcaaattaaccaaccatcttaagaaccagggctattatacgaatacttccacattcatactccaatgtcaaatatgtctcgcaacattgcaaggagaaaaagaagcaaatagccacgcaaagaaaactggccacacaaattttggtgaagtcaattga PAS_chr1- 22atgtcattgtctgatcctgaggacagcctaagacgtctacttgtgagtttaccctccaatgttaagtacgatgcggagtcttcggtattgaa3_0150aagccgactgaaccttgctctatatttctcgctgacaaagagaggtgaatatctgggttccttggtaacggacttgccaatggatttgccatcatcttattccgaaatcttagaggctgaagatgattcctactcaagattggctgaatcaatgtacaaatgccctaactataagcatcatggaagaccttgtgcaaggcagttcaagcaaggagagccgatataccggtgctacgaatgtggttttgacgagacttgtgtaatgtgcatgcattgttttaatagggagcaacatcgagaccacgaggtttccatttcaattgcttcgtcctccaacgatggtatctgtgattgtggagatcctcaggcatggaatatcgaattacactgccagagtgaactggaacaagatgaccattcaagttcagaagttaatccagattttaaatctgctataagggaaacaatggatattattttagattacattttggattgtactattcattctgcatctatgcttcctgctgttcaggacatgatgaaggaagacccatccgactatgaaatggctattcaatatgcttcagatagttcttctctgcccattgaaagatatggagtggaagacacgaatgttcagtcctggaacgtagtcctgtggaacgacgaattccataattatgatgaggctattgattgcatccagcaagttagtagatgttcattgtctaaaggacaagctgacgctcaaaagattaatgattttggattttccatcataagaagaagtgaatccttgcctttactgatagaaaggtgcgccaaggttgaagaatccgggtttactattacgattctttctgatagagatgttacccgattgattattattgatactatttttgattggttattgactctgttagaaatttcaaggccggaaattcagactgctattagagaaagtttgtgtgaatctcttttggaagagtttcatgccgacattcacgaaggagattttttctaccgggaagatgaatattcagacacacggggtttgctggatttcaaaaacagaattccagccccattggtggaggatgtaatgaacgagttgtctattgatgacttgaagaacagaaaactatccagttttcttaatgaacaaccttcagctctagtcggctcaagagtacagtatttcttctatatggatctgcggttctggaaaaaggcaagaaaatctttgaaattgctaacgacatctgttttggtttcaaacttggaatacaaaaagactttttctgaacagtttgtgaaaatatactcgcatctgttgatattgatggcaaaggaagatagagagtggcttctcagcaatgcgggcaatgctgtagtacaactctttacatgtcctaaaacatctctccatttattacaaccacaatatttcagaagcatcatcgtccccatcattttgttgttcgaatcttatactggaaaccatttgctgtggaaacgaccatatcaactcttatcacgtaagaaaggtctcaaatttggtttaatgcgttctttaactgatctagtgacgttaatcaccactgcccatcaatcagaagaacatttggtactttttcagggtaagaacttcatttacataatcatgctttttaggatgttccagagtgccctgacattggtcagaaaggaaggagaacatattaccagggaatccactgaatttttaacctacctgcaaatatcttactaccttaatgatgtcatcaaaggtattgttgaaattgcgcaggttcctgaaatacgtaaacctgaacattggaaagttgtggaaacaaacatacaaatattggccactttaatttcatcagaaccttataagtttcatatggtgcacgaaaaacaacttattgaccatgacgtaacaaagaaaccaacctctcttattaatccattgaatggattactgtctaacatgttaacaaccgtaagggccaattctttttcatttttaactcgtcaagtttctcagattaatttttggagtatcaatcccgaagtctcattttcagatgatttagactatctgaaactctcatcgaagagtttagaagcaattactttgagttcacagataaaaattggccactggattagaaatggatccatgactagtaaacaagcgcaattgtactgcacgaggttcactcaatatggttacatagccgacgttcatttgaaccaacttgctatactcgaagaacgcgacgatgatcgtctattattaaacattttggatagattcaatctaatagattggttctataacgatcaggacgtgcttggtactgttttcgaagaacgatctttttacctaatgaatgaattggttaagtttctttataatatgttttcacacagagttaacttccagtttgaatcaaatttcacagagaaaacccagtatgaggtaacgcaatacattttatacacgctttgtaaaggatctttgtcattttcagatctgacagccgactttcctatctccgtggaagttactgtttttgacaagatccttgatgaggttgctgtttacgaagagcccaaaactatgaatgattctggaaagtattctatcaagaaaagttattacaaaaagatggatccaatgtctatttatgtggactcgggtgatttcgatgatgtatcaacagcgatagtaaaggaactttcaattttaggaaaaataaaagaggagaatgttgtaattgaacctcagatcagtggaccgaatgaatccaacagccgtgtcttgagcagattgaaacggttcttcattagcaaatctgtagtcaaactgttttataaattgttacaatctgctctttctgagagcaatgagacctacgtcattgaacttttacatttgattcaagcagttttattagatgaacatgaattgtacagaatcgaagatccagtgcaatactttattcaaattcctgtgtgtgatctactgttatcagttgttgagcacaatgatttttcacgacctgtctgcaaaaaactgaagttctattgaattggttgatccagcgggacgagtcaatcattgactcattggttgattcttttggtgaaaagcacattgaaaactttaaaaaatctaagggatctcaagttctggagactaaacgagctaaacaaaagcgtttagccaaggagagacaagagaagatcaaatcacgatttgctaaacagcaaaagtctttcatgaagcagaatttggacgcaaaaaagagtgcggaacatgtaactacacatttatccaaagacaatgaaggattaggtagttcctcccaggactcttttcatgagtgcattctttgtcaacgtgctcaggagggcaacgagatgtttggaatccctgcatatgttgaaaaagtttccacgttttgggattttcaacctaaggatgagtcaacctatacggaaagatgcttaacaaccattgaaaatcaaatgaaacaattgcatgaagaaacggatgccaacaatgaggttagagaacatctttattatcaaaaagatactcctgtaaaaagcatggcaccgatatcttcaagacacattgttaagtcatgcgggcaccacatgcattataaatgtttttctgagttactagaaaacagcaggaagtttagcacttgtccgctttgtcgctctgccattaatgcttttgttccacaatttgccatgaaaaacgatgctagccctgcttttcaggaggctgcttcgaatattagtcactttgaaaagttgaatttgaatcaaattgtatcgaaatatcttctcaatgattccttcttgaaatttattgcggaagaaagtaaggaccagttcatgtatttgaatgagtttaaagacattttgaaagacgccccagatgcttctgaccacatgttgagtgaagggttatttccctcatttttggccatgtcaacattattgggtaataccctagcaaatactgaaattcgtctcagattatcccccgagaagattccccagaaaggaaacttgaagagaaaagattcggaattaataacctcattacttcaatgtgtctcggttatctcaatcttattgaaacaatcttatcctgaagagcagtatctgtctccatttttgaataaaccaaattcattaattattgattttgccatttcacttctacttggaaaagaagactcacttcaagaaactattgtgggcatttacaagcaaacaattctgcattcattgaatttactattgactaacgttggagataatgagcatttcagaaggatgctgagcggtgcaaactctattattaatgattcagaactggccattttcaaaaagtttgtgtcaacggccacttttacctctgatgtttcattcattacttgcaacgaacaattattggttggactgtatattcttttggagaaaaccaccacagtgtatcttaaacagttgtttctgataatcagcatgtgcagacccttggacttatgcctaaatcgtgactacgagaattccaatgattacgaccactatttgtttggccaactgtgcaaattttttaacctttccagtataatcagttatttgggatctggaattcctggtggaaacctattggaggagcaaaatgatcttatattaaaaggacaatccactctcccttcaacaattgagtatccaggtctcgtttatcttgtgaatttgcctagagaactgaacacttttactttttcaaaatatgacacccaagatgcagttaatctaaacttttctgtttgtttaacgtgtggcaaaagagtgaaacatagcggtgattctgaaaatgaaattgaaaacttccctgggtacaatggtgttcctcttactttgtttcaccatcataagaattgtcctttctctggatatggagaagcacaatgtatcttcttaaccccaaagttgaataaattgactgccttactaaagattcagcctccacgaggaatttctgatcgctcgctatatcacagtacatttgcattcccattgagcagcccatatctaaccacacatggagagtcacattctggtcatggaggcttgatacgcaaagcgttcctgaatagagatcgatttcgaaatctgaatgagctatggttggatggtgaactagctttgtatatttcccgaagccttggggattctcaaattgtagcggaaccaatcaaccctgttatgattacaatgccgggaggtattcaggaggcattaaatcttgcgttcaccactttcctcggtgaccaagaacccggggatgatgacttggaagattatgagtatgacatactgttaaatagatga PAS_chr1- 23atgtctgcctttggtgtggttccgagtgtattaaacactggaaaccagatcaagcagaaaaacggaacgcttttcaagaaatcttctggagt3_0221ttacaataaacagcagcgggatcacaattccagggataaaaagcgatcagctcgtaaaacaaatacaccgccaacaccgactgagagtacttccgcaaagaagtcatcaactcaatcagacgacaaagtgagtcctgatattttacaattgtcgcatattgagattcaatatgtgggcccacttctttccaacccagaatctttgggatatgtgaaacaaaacaataataccaaaatcaagactccgaaatatttagtggatacagattcaaacctggtttttggtcctgatacaactaataaatgggatattgagaaccagcacaaaatgatcgaaatggaatcttcccatcaaggtgactggcaaggtatttatgaacaatttcaagaaatgaataaagtggagcgtcaaaaaatggaagatctgggcttggtggcaaaagagggacaaagcatggacctgacaaatgctatctcattcaaaggtagctgcgtggatatgtgtcccgtttatgatagagtcaagagggaggtacagagagatgttgatccattggagagagatcctgccactggtaagatatctcgagagagagctttaaagaaatttgtgcgtccttcaggccaagcaccgcctcttccttctgacgtaagacctcctcatattctggtaaaaagtttaaactatattgtggataatttgctggataaattaccgcaaagtcattcattaatttgggatagaacccgtagtatcagacaagattttacactacagagctactctggcttggaagcaattgagtgtaacgaaagaatttgtcgcatacatctactttgtgctcatataatgccgggttctgatcaatctgacttctccaagcagcaagaaattgaacaattcacaaaatcattgaaaacattaacagacatatatgatgttgtcagatccaaaggaggaaaatgtgccaacgaagctgaattcagggcttataatttgctggtgcattttcgggacccaaatctaattcatgaaatccagaacttacctactcgaattcttaaggacgaacgagttcaacttgctttaatgtttcgaagtctactattgaataataatttcaaagaataccagaggaacattcctggttgcttgggggtttttcagcagtttttcaatatgtgttttgatccagccaccccattcttaatcggatgtgtgctggaacttaattttgaagagataagattttacgctttgaaatcgatctcacgttcttatcacaagaaatctgcccctctaacgacccagaagttagcatctatgctcggatttgattccgaggataagctcctaactttcactaattatttcaagactcctacgtgtactaattctagaaatgaaacgtgcattgatatctcaaaacttagatacgagagttttacggatttggctgctccaaagcagatttacacttcaagattagacaacaaattaaaaggattcacctataaggatgttgttgatcaaggattaaataacacatccttgcacatagctaatttgaaagaaacaatggctcagaatcaacatattgcagtggagaaattacccaatatctcatttccacaacatgctttgtcttctacccctttcgaagtagaatcaaagtcagacatagtcagatcttcttccggatcggctccgccccagactttgatcccaccgattcaagaaaaagtaataacttctcaaatacagccaccaataactcccgtcgttcccactgaagaaatccaaactcttccaaaaatagaggagcccaggttcaaagatcttccaaattttgaaaatgcatgcaaagaggtttcctctattttaatcaagaagactatatctcctttgattgctcccatagtgaacaatcagctagaagagtacaaccggcgacaaacggttttaagggatcaggagagacaaaatcaaagaagacaacttttgatttcatcccttcaggaagaattgtactctgcttttatacgagaacaagtgtatattcaagtggttgatactcaagccaaagagtgctttaacaagaatctgaaacggcgaatatttcagaaattcatcgggggtttaattacattgaaaaacaaacaaatgaataagagaagaaaacttgatgaaattcaagtcttcaagaataaggttgtttcctcaagtcaacttcggtattcagtttcaagaagtcaaacggaggacaattcaacgtcaaactcgagtgacgaggaagcatcagctgttcagatgaatattactctttcaccatctgtggatccactttggtcacccatagatattaagtttatattagactccaatttaaagttgtttgaggataacaaggataaatactggaatttcatgtttgcgattgccgattggactattctaccaagcaaatggcttcgttacaaattccaacttcaaaaccccagtctcataaatactgttgaatcctcaaattacaaagccaaattacgggctctacccagtgacaaacttcttacaagggaatacatggagcactgtcgatttttggtatttcaagtcggaaaggttgatgaatcatcaaacctgaaagaatctttgttcagagactcacagtttattaaccgattaatgaaatatgccaagaagtactcgcaataccagattggagtacttgtcttatattatcatgaggatgactcttttgataaacagaaaattattgatcttttgttattagaacaatacacaaataagttagtcaactcactcgagatagttgacatgaacaaactcacaaatgatgaactgataaaagcattgaccacgctagtccacaactataaggataaaggtatcaacaaatcggtaccaacatcttccaccaaaggacacaccactagcattatggaacaggatatgacagtatacagctacagcacgtccaattccagggatgctaagcttaattatattttgaagcaagcctacccccgcagggggtttcacttgaaacaatga PAS_FragD_0022 24atgtcagaatggccctcagctttggaaaattttgtaagtcattgtttccagcgtgccaacattgagagctttccacccggcaaaaaaaaagaactccaaaaacagttgacgcaaatcatcaatttagcaattcttgaaaacaaacttaattctaataactggtccaaacaaaagctaccaatatttggagaagcaagagagttagaattggagcagaaaatgggaaatgtttatccaattactgtttctagtcgaagaagtgacttgatgcatcaagaggcagttcaaccatctgagcctttagttccctccgaaagccaacaaaagaaaaagtctagagaattgcgatttaagatcactaaaaaaagttctgtatcacccgcaaataaaatacaagttgcttgtgacttgaattgtaaacttgtgggaactaacacctctatcgagaaagattattatagacttacatctcatccggatccttccatggtaagacctttgcctattttaaagaaatcgttgcagcatctttacgccaaatatcaaagtctagaacgtttcaaagctctcagcaaggcagagtacagctattttttgaatcaactgaaatccctaaggcaagacctcacagtgcaagacattcagaatcagttcactgttaaagtttacgaatttaatactcaattggcgattcaaaatgaagattttggtgagcttaatcaatgtttgactcagctggcgcaattgtacactgtatcaactatgggtcatacttattactattctgatactggcaaatacaaccaagagcacaactgttttcttgccaaggatctttgtgaggatcgaaaccatatcaatatgttcaaatttacgagttatagaattttatattttcttctcatagacgccccctgggaattgctaaaaataaggcaggatttattcaaccgtggtcaacagtatgcaattcgtcacaacaaatttcttttgaagtcattcaagctttcggatctcataaccgccatggattatattcatatcaaggacgaatattcattcctcgtgaatatggactcagatgtctgcaatttaaggacagtgtttgatgacgaacatatgactttgaaccaagacgactggtttttctataagatactctaccataagattttcttacgagaacagctgaaggccctgataactataagcaaatcttatcgacagatatccctctactacttgaaaaatctactgatggatttagtattcttggaaaagaataagttatctcgtttcattgagaatggtgaggtatttaactgcacgagcgcaagatcattactgcttcaaatagagaagaagcagctatcaaagatagatatcaagggtcaggtatga PAS_chr2- 25atggttgactcagagactatcaacaaattcatagaagtaacgggagcctctgccttccaagcaattcagtacctagaggagactgatgactt1_0159tgaagcggcagtcaatgattattattcctctcaactggagaatgagaagggcaagggtaaatcagaacgtccagtcaatcaaacaaaggcttctgcagggcccaagatcagaactttcaacgacctaaatagcaactcaaatggggacaacaatcttttcacaggtggtgaaaagtccggtcttcaagttgagaacccagacaaacgtggggacccttttgggttggtcaatgatcttttgaagaaagctgaggaaactggccaacaaccagatacaaggccccatgaagaagctcctgctagacaatttgttggaactggccacaagctgggcagtacggacagtccctccgaagttagtgtctgaccctgcctcaagaataagaagagctcagaaagtcagccgacagataacattttggaaggacggattccaagttggagacggagatttatacagatatgatgaccctgcaaacgcaagatatctagccgacttgaacgctggaagggcaccactggctcttctagatgtcgagattgggcaagaggtagatgtcacagtgcataaaaagatagaaaaaaatttcactcctcctaagaaagcccgagttggctttcaaggtaaaggtcagagattagggtctccagtaccgggcgacataaagctcagtcaatctcctgaggtgcaacaagaaacacaagaggaagctgaggaggaaaagcaaaaggaggaggccgagcagctgggaactggggattctcccgttcagattagactcgccaatggtcagagaattgttcatagattcaattctactgattctgttgctcaattatatgcatttgtcaatgaacatagtccctccgccagagaatttgtgctttctctagctttcccggtgaaacctattgagaacaatgaggacacactcaaggatgctggactcataaacgctgttgttgtccaaagatggaaataaPAS_chr2- 26atgggcgtgatacttccagacgatggtaagcaatcgggaggccaaccaaatagaagggctaaagtcctgagccgatttttaccaccagaaca1_0326tcaaagaccttcaatcggcctcttcctgggaccttttactccagcagctgataatgagattgccctgtggacttgcattggcgctcagctctttagtgggctggcattgcttagaatgagccgaagatttgttttttcgcccgatcaatctgtaagaaggtttctctttaagacttttcataatgtggtaggtgcagccctgatatttgggagcggattagaagggactaggatgcttctacctgaggatccttggaaagaagaagctagaaaagcaagaatattggcccaattgaaaggtgagcccgttagttggtggtatggacccaagagttttattccttctggaaggttagaatacacaaaacagatgcagtttcacaactttgaagtcatgcataaatcacccgaaaaaatagcccgagctctcatgattaaggacaaactcaaggaggaaacaaataccctttattcgtccattcatgagaaagcggaacaacagactattcgactctctaaagatctacagaacaacgttcccctcaaaggggtaacgtcatatgttcctcaatttagcacttcaaatacggacaccaagttatatttgaaaaatgttagcttgaagacccatgccgacctggaaaaggtctgggcagaacacaatccttgggacatcctggaagagaaaatttctccaatttccgtaattgcactgccaaagtttaacccaattatatctgaggttgaacctgacaagcagcaaccatctacgggtgatatcaaatacattagtgacagaaaataaPAS_chr1- 27atgaaatatttgccactcgttgctaccctggcctcttcggccctcgctgctggcatcaacttcgcccaattactggaccagaagccactgga4_0611cattgccgataatgttaaatgggaattgaagcctgaggtcgactctgctgctcttcaaagtgcagtcaatgagctagacttgaaaatcgaagccagctatttgtttaaagttgcacatggttccgtctttgaatacggacatcctaccagagtcatcggttctcctggtcactggtccacaatcaaccatgtcctcgacacattacataacttcaaacactactacgacgttgacgttcagccatttgaagcctttaccggtatccttaagtctttctcattgaccattaacggagttgcaccaaagtctgcagaagctttagatttaactcctcctactcctggcggttttccagtgaccggtccagtcgttttagttgataattatggttgtcaagcttctgactatccattcaacgtgactaacggaattgccttaattcaaaggggttcttgttcattcggtcaaaaatcagaacttgctggtctccgtggagccaaagccgctctcatttacaacaacgtgccaggtagtgctaagggaaccttaggtgccccaactcctcatcaggtaccatcgttgtcactttctcaggaagatggagaggccgtcaagcgtcagcttctgacttctggaagcgtaattgcaactgtcgctgtcgattcctacgttaagaagttcaaaaccaagaatgtgattgctaccactcgttacggtaatgatagcaacattgtgatgctaggtgcacattcagactctgttgctgctggaccaggtatcaatgacgatggttctggtaccatctctcttttgaacgtggccaaatacctaactaaattcaaagttaataacaaggttcgtttcgcttggtgggcagctgaagaagaaggattacttggatccgactactacgtttcaaagttaacccccaaggagaaatctcagattcgtttgtttatggactacgatatgatggcttcccctaactacgcctaccaggtctataatgccactaacagcgagaacccagttggatctgaggagcttaagaatttatacattgactggtacgttgaacagggtctgaactacactctagttccatttgatggccgatccgactatgatggattcatcaagagcggtattcccggaggtggtattgctaccggagcagaaggtttgaagaccgaagaggaggctgaactatttggtggtgaagctggagttgcatatgacccatgttaccactctctttgtgacgatttggccaaccctgactatgttccatgggttgtcaatactaaattaattgcccacagtgtcgccacttatgcaaagagcttggacggattcccattgcgtgaggagcctagcccattcaagatgactgcccagtcaaacttcaagtaccacggtccaaaacttgtcctttag PAS_chr1- 28atgctcaaacactccttaaaaacagggttggtctttctcacttggataccggtgatttatacggtaaaggaacacctgatatacgttggaaa1_0274ggtggaaggatcctcaatgtcacccactttgaatcccgttaaaggttattctgactatgtgattttatggaagttaaacttcaaagagtcactcaaagtgggagacgtggtttttataaggtctcctgtagatccagagaagttatatgctaaacgtataaaggctgttcaaggggataccgtggtgactaggcatccataccccaaagacaaagtgtccattccaagaaaccatctttgggtagaaggagacaatatacacagcgtggatagtaacaactttggtccgatatcgttgggccttgtattaggaagagcaactcacgtaatttttcccctgaacaggataggtaatatctctggtgaagggggtagagaagttagggaggattatttaagagcggaggacagtccgatgtaa PAS_chr4_0834 29atggtttctgaaattcagcttagattagctgttattatttatgatatactctgttcggcgtcttatgttctagtcatccatttgagaccaaccagagcccttccgcatcaacccatagaccgtaacaatcctctaacgattaaagaaaggtgccagcgagccagtgtgttgactgctacacatgtattattattgcctattcttttaaaagtgttgagactgtcagaaattgcggaaactacggcgaaacttggaatagtggtgggatatcacaaccagagctggtctttctctaacctccaagatgatattgtcagcattttcaaagctttaggtttgaccatgattctcttttctggtcctattgtagattatttttactattcaaactcaacagaagtaatcaagcaagatctggcgtatgtcgttagcctcgagggtatgcgtgatctacttgtgggacccatcactgaggaacttctttatcggtcatgttccatttcattaatgctagtagctaacgattacgccaacaaatttctgttcggccaacactggttaataatggtatcatcactctacttcggtatagcacatcttcatcatgctgttgaactgtatcattgtaaaagatattcattaactaccataaccatatcaactgccttccaatggtcatatacaacgttatttggaatatatgcaagctttctatacttgcgaacaggatctgtatggtcagcaatagttgttcattcattttgcaacatgatggggtttccccggttgacatttggacgtgatgaagcgagagattggaaagtgggttactatgtgttgctcgctctaggttccgtcctattcaaaaagtttctttactctctaacagaatctaaccatacgcttcttctataaPAS_chr3_0896 30atgtatcccgaacacaagtatcgggagtatcaacggagggtgcccttatggcagtactccctgttggtgattgtactgctatacgggtctcatttgcttatcagcaccatcaacttgatacactataaccacaaaaattatcatgcacacccagtcaatagtggtatcgttcttaatgagtttgctgatgacgattcattctctttgaatggcactctgaacttggagaactggagaaatggtaccttttcccctaaatttcattccattcagtggaccgaaataggtcaggaagatgaccagggatattacattctctcttccaattcctcttacatagtaaagtctttatccgacccagactttgaatctgttctattcaacgagtctacaatcacttacaacggtgaagaacatcatgtggaagacgtcatagtgtccaataatcttcaatatgcattggtagttacggataagagacataattggcgccattctttttttgcgaattactggctgtataaagtcaacaatcctgaacaggttcagcctttgtttgatacagatctatcgttgaatggtcttattagccttgtccattggtctccggattcttcccaagttgcatttgtgttggaaaataacatatatttgaagcatcttaacaacttttctgattcaaggattgatcaactaacttatgatggaggcgaaaacatattttatggcaaaccagattgggtttatgaagaagaagtgtttgaaagcaactctgctatgtggtggtctccaaatggaaagtttttatcaatattgcgaactaatgacacccaagtgcctgtctatcctattccatattttgttcagtctgatgctgaaacagctatcgatgaataccctcttctgaaacacataaaatacccaaaggcaggatttcccaatccagttgttgatgtgattgtatacgatgttcaacgccagcacatatctaggttacctgctggtgatcctttctacaacgatgagaacattaccaatgaggacagacttatcactgagatcatctgggttggtgattcacggttcctgaccaagattacgaacagggaaagtgacttgttagcattttatctggtagacgctgaggctaacaatagtaagctggtaagattccaagatgctaagagcaccaagtcttggtttgaaattgaacacaacacattgtatattcctaaggatacttcagtgggaagggcacaagatggctacatcgacaccatagatgttaacggctacaaccatttagcctatttctcaccaccagacaacccagaccccaaggtcattcttacgcgtggtgattgggaagtcgttgacagtccatctgcatttgacttcaaaagaaatttggtttactttacagcaaccaagaaatcctcaatagaaagacatgtttattgtgttgggatagacgggaaacaattcaacaatgtaactgatgtttcatcagatggatactacagtacaagcttttcccctggagcaagatatgtattgctatcacaccaaggtccccgtgtaccttatcaaaagatgatagatcttgtcaaaggcaccgaagaaataatcgaatctaacgaagatttgaaagactccgttgctttatttgatttacctgatgtcaagtacggcgaaatcgagcttgaaaaaggtgtcaagtcaaactacgttgagatcaggcctaagaacttcgatgaaagcaaaaagtatccggttttattttttgtgtatggggggccaggttcccaattggtaacaaagacattttctaagagtttccagcatgttgtatcctctgagcttgacgtcattgttgtcacggtggatggaagagggactggatttaaaggtagaaaatatagatccatagtgcgggacaacttgggtcattatgaatccctggaccaaatcacggcaggaaaaatttgggcagcaaagccttacgttgatgagaatagactggccatttggggttggtcttatggaggttacatgacgctaaaggttttagaacaggataaaggtgaaacattcaaatatggaatgtctgttgcccctgtgacgaattggaaattctatgattctatctacacagaaagatacatgcacactcctcaggacaatccaaactattataattcgtcaatccatgagattgataatttgaagggagtgaagaggttcttgctaatgcacggaactggtgacgacaatgttcacttccaaaatacactcaaagttctagatttatttgatttacatggtcttgaaaactatgatatccacgtgttccctgatagtgatcacagtattagatatcacaacggtaatgttatagtgtatgataagctattccattggattaggcgtgcattcaaggctggcaaataaPAS_chr3_0561 31atgaaaccgtatcaccatgcaaaaagccgcccaataggcagctacctgtattttggggtgtttaccgtagcattgacatttctgacgtggcttaaatatgacgcagagctgtttgctcagcaggttcactcgaaagacatttatgacccacagttcaacattacgttgccaattgatggcccaacatttaccccatcaaagaactattcaattagtgttcaaaatgcagcagtggcgtccgatatagaacaatgttcaaaattaggtgtatctattctgcagcaaggtggcaatgcggccgattcagcagtcaccgtggccctgtgtatcggaacaatcaattcgtattcgtccggtatagggggaggaggattcattgtctctaagttaattgataatcctaccgctctgagttttgattgtcgagaaatggctccttctaaaagtttcaaagaaatgttcaactatcatgaggagaaggccagagtaggtggtttggctgtcgccattccaggagagttaaagggactctatgaactgtttcagcaccatggttctggtaatgttgagtggaaagatttgattttgcccgttgctgagttggctgaggtgggatggactgtcgatccgctgttttctagtgcattgaaatctattgagcaccatatttacgagcattcatatgattgggcctttgcattgaatgaagacggaaaaattaaaaaaagaggtgactggattaatcgtcccatgttggctactacgttgaggagaatagctgaaagtggcaacgttgatctattctatgacccagagagcgatatagtacaaagcatggtgaatgctactagaaagtatggaggaatccttgaagcctcagactttgcaaaatatagagttcgaattgaagaatcgttgacattgcataactttacatctgacggccttacggtttatacgtccaatggggcatcctcagggttggtgctccttgctgggttgaagctcatggacttattcgaagatttcaaggaatttcataatgatttcggggctgttgagtctcaaaggcttgttgaaacgatgaagtggatggcttcagtaagaagcaaccttggagatttgaacatttactccaccaacgaaactgaaattgacgatcataggaagaggtacgacagatacaaatcagatgagtgggcaatagaaactcatgccaaaattaatgattcccacacacttccttcttggaaagattatgctccagcctttctacctaatgatcctcatggtacatctcatttcagtatcgttgaccaatacggtaatgcggtggctatgacaaccactgttaaccttggatttggatctaaaatacacgatcctatatcagggattattctaaatgatgaaatggacgatttttcagttccaacatcatctaatgcatttggtttgcatccatcaatctataattgggtagagccttacaaaagacctctctcttcatgtgctcctaccgtaattgttgattctctgggagtacctcattttgtcatcggggcagcaggagggtccaagatcactaccacagttttacaagcaattataagagtttaccattatcacctggatcttttagacgtcattgcatatccacgctttcatcatcaactacttccggaagaagttcttctggagtttccacgagataataaactaatacgccatctaaaagaaagagggcatgatgttagagtccaagcaccaatatccaccatgaatggtatcctacgaaaaagaggtggaagcctgatagcagttagtgatcactggagaaagcttggtcgaccttggggcttttga PAS_chr3_0633 32atgaaatcggttatttggagccttctatctttgctagcattgtcgcaggcattgactattccattgctggaagagcttcaacagcaaacattttttagcaagaaaaccgttcctcaacaagttgctgaattggtgggcacccattactctaaggatgagataatcagtctatggaaggacattgagctggatgtacccagggaaaagatccaagaggccttcgataagttcgtaaaacaatcaactgccacttcccccgttagaaatgaatttcccttgtctcagcaagattgggtgacagtgaccaacaccaagtttgataattatcaattgagggttaaaaaatcccaccctgaaaagctaaacattgataaggtaaagcaatcttcgggatacctggatatcattgatcaagataagcatcttttctattggttttttgaatcccgaaatgatccgtccacagacccaatcatcctatggttgaatggtggacccggctgctcttctattacagggttgctattcgaaaagattggccccagttacatcaccaaagagattaagccggaacataatccttattcatggaacaacaatgctagtgttatcttccttgagcaaccggttggagtaggattttcttactcttctaagaaagtcggtgatactgcaactgctgccaaagatacatatgtgtttttggagcttttcttccaaaagtttcctcagttcctgacctctaatctgcacattgctggggaatcgtatgctggccattatttgcccaagattgcttctgagattgtgtctcacgcagacaagacgtttgacctttcaggagtcatgatcggtaatggtcttactgatcctctaattcagtataagtactatcagccaatggcctgtggaaaaggtggctacaagcaggtcatttcggacgaggaatgtgatgaattggatagggtctatccaagatgtgaacgtttaacgcgggcatgttatgagttccaaaattcagttacttgtgttccggcaacactttattgcgaccaaaagctactgaagccgtacactgacactggcttgaatgtctatgatattcgtacaatgtgcgatgaagggactgatttgtgttacaaagaactggaatacgtggagaagtacatgaaccagcctgaagtgcaggaagccgtgggctctgaagtcagttcttacaaaggttgtgacgatgatgtcttcttaagatttttgtactctggcgatggatctaagcctttccaccagtatatcacggatgttctcaatgcaagtattccggttctgatttacgcaggtgataaagattatatctgtaattggctaggaaaccaagcttgggtcaatgagctagaatggaacttgtctgaggaattccaggcaactccgattcgaccgtggttcactttggacaataacgattatgcaggaaacgtacaaacttatggaaacttttcctttctaagagtatttgatgctggtcacatggttccttacaatcaaccagtcaacgcacttgacatggttgtcagatggacacacggtgatttctcatttggttattaa PAS_chr4_0013 33atgactcaattagatgtcgaatcattgattcaagaactcacactaaatgaaaaggttcaacttctgtccggatcagacttttggcacaccaccccagttagacgtctaggaattccaaagatgagattatctgacggtcctaacggcgtccgaggaaccaagtttttcaatggagttccaaccgcatgttttccttgtggtactggattaggtgccactttcgataaagaacttctaaaagaagctggctccttgatggcagacgaagctaaagcaaaagctgcctcggtagttttgggtcctacagctaacattgctcgaggccccaacggaggaagaggcttcgaatcttttggagaggatccagtggttaatggattatctagtgctgcaatgattaatggattgcaaggtaaatatattgcggctaccatgaaacattatgtttgtaacgatttagagatggatcgtaattgcattgatgcacaggtgtctcacagagctctaagagaagtgtaccttcttccattccaaattgcggtaagagatgcaaatcctcgcgctatcatgactgcttataataaagcaaacggtgaacatgtatctcagtcaaagtttcttctagatgaggttttgagaaaagaatggggctgggatggtttgttaatgtccgattggttcggtgtgtacgatgcaaagtcttctatcactaatggtcttgacctggaaatgcctggtccacctcagtgcagagtccattcggcaaccgatcatgccatcaattctggggagatacacataaatgatgtcgatgagcgggtgcgaagcctcttaagtttaattaactattgtcaccagagtggcgtcactgaggaggatccggagacatccgataacaacaccccagagaccatcgaaaaactcagaaaaatcagtagagaatcaatcgtcttgctgaaggatgatgacaggaacagaagtatccttcctctgaagaagtcagataaaattgccgtgattggaaacaatgctaagcaggctgcatattgcggaggaggttctgcttctgttctctcgtaccatactacaactcctttcgactctatcaaatcacgattggaagattcaaacactccagcttacaccatcggtgctgatgcttacaagaaccttccgcctttgggccctcagatgacagacagcgatggaaaaccggggttcgacgccaaattttttgttggctcgcctacatctaaagatagaaagctgattgatcactttcagttgaccaattcacaagtcttcctggttgactactataatgaacagatccctgaaaacaaagagttttacgtagacgttgaagggcaattcattcctgaggaagatggaacctataactttggcttgaccgtattcggaacgggaagattattcgtggatgataagctggtttccgatagtagccaaaaccagacccctggagattccttttttggactagcagctcaagaggttatcgggtccattcatttggtcaagggtaaagcatataaaataaaggttctttatggatccagtgtcaccagaacatatgaaattgcagccagtgttgcttttgaaggaggagcatttacttttggtgcagcaaaacaaagaaatgaagatgaagaaattgctagagctgtggaaattgctaaggcaaatgataaagtggtgttgtgcataggtctaaatcaagactttgaaagtgagggattcgacaggccggatatcaaaattcctggagcaaccaacaagatggtaagtgctgttttgaaggctaaccctaacactgtgatcgtcaaccaaacaggaaccccagtcgagatgccatgggccagtgacgctccagtgatcttgcaggcttggtttggggggtctgaggcagggaccgctatagctgatgtactattcggtgactacaaccctagcggaaaactaacggttacttttcccttgagatttgaggataaccctgcatatctcaacttccaatccaataagcaagcatgttggtatggggaagacgtttatgtgggctacagatattacgagaccatagacaggcctgtgttattcccatttggccacggattgtcattcaccgaatttgattttaccgacatgtttgtcaggcttgaagaagaaaaccttgaagttgaggttgtagtcagaaacacaggaaagtatgatggtgctgaagttgtgcagttgtacgtagcaccagtatccccatccctgaaaaggcccatcaaagaactcaaggaatatgctaagattttcttagccagtggtgaggcaaaaacagttcacctgagcgttcctattaagtatgccacttcgttctttgacgaatatcagaagaaatggtgctccgagaaaggagagtacacaatcttactgggatccagctcagcagatattaaagtttcgcaatctattactttagaaaaaacaactttttggaaaggtttatag PAS_chr2- 34atgttcctcaaaagtctccttagttttgcgtctatcctaacgctttgcaaggcctgggatctggaagatgtacaagatgcaccaaagatcaa1_0172aggtaatgaagtacccggtcgctatatcattgagtatgaagaagcttccacttcagcatttgctacccaactgagagctgggggatatgactttaacatccaatacgactactcaactggttcccttttcaacggagcatctgttcaaatcagcaacgataacaaaaccactttccaggatttgcaaagtttgcgtgcagtcaaaaatgtttacccagctactctcattacattagatgaaacatttgagcttgctgacacgaagccatggaaccctcatggaattaccggtgtcgattctttgcatgagcaaggatatactggtagtggtgttgttattgcagttatcgatactggtgttgactatacacaccctgctctgggtggtggtatcggagataatttccctatcaaagctggttatgatttgtcttccggtgatggtgtcatcacgaatgatcctatggattgtgacggtcatggtacctttgtatcctccatcattgttgcaaataacaaagatatggttggtgttgcaccagatgctcagattgtcatgtacaaagtgttcccctgttctgatagtacttcgactgacatagttatggcgggtatgcaaaaggcctatgatgatggtcacaagattatttcgctatcactgggatctgactcggggttttccagtactccagcttccttaatggccagcaggattgctcaagacagagttgttttggtggctgctggtaactctggagaacttggtccattctatgcctcctcccctgcttctgggaaacaagtcatttcagttggatctgttcaaaacgaacaatggacaacctttccagtaacctttacctcttcaaacggtgaatcaagggtttttccttacctcgcttacaatggtgcacagattggatttgatgccgagcttgaggttgattttaccgaagaaagaggatgcgtctatgaaccagagatctccgcagataatgcgaataaagctattttgttaagaaggggcgtcggctgtgttgaaaacttggaattcaatttattgtctgtggctggttacaaggcttacttcttgtacaactcattttcaagaccatggagtctcttgaatatttctccactgattgagctagacaacgcttactctcttgttgaagaggaagttggaatatgggtgaaaacccaaatcgacgccggtaacaccgtcaagttaaaggtgagcacgagtgaccaaatgttgccatctgataaagagtatttgggagttggaaagatggattattactcctctcaaggacctgcttatgagcttgaatttttcccaacgatatccgctccaggtggagacagttggggcgcttggcccggtgggcaatacggtgttgcctcaggaacaagttttgcttgcccctatgttgcaggtcttacagctctttatgaatcgcagtttggaattcaagatccccaggactatgtgagaaaattagtctccacagctaccgatcttcaattatttgactggaacgcagtgaaacttgagacctctatgaatgctccacttattcaacagggagctggtctagtgaacgctcttggtttgtttgagactaagactgtgatcgtgtctgctccttatttggagctcaatgacaccatcaatagagccagtgagtataccattcaaattaagaatgagaactctgagactattacctatcaagttgttcacgttccgggaactactgtctactctagatcagcttctgggaacatcccatacctggtcaatcaagattttgcaccttacggtgatagtgatgctgcgacagttgctctatccacagaagagttggttttgggaccaggagaagttggtgaagtcactgtgatcttctctacagaagaaattgatcaagaaactgctccaattattcagggtaagattacattttatggtgatgtcataccgattgctgttccttatatgggagttgaagttgatattcattcctgggagcctctcattgagaggcctttatcagtgagaatgtatttggatgatggttccttagcatatgttgatgatgatcctgattatgagttcaatgtgtatgactgggattctcctagattttattttaacctgagatatgcaaccaaagaagtatcgattgacttggtgcaccctgattatagcattgagaacgactacgaatggcctttagtttccggacacaacaactattatggtcccgtgggatacgactacgattatacctcgggtcaagcctttttgcctcgttactttcaacaacgtattaacgaacttggatatctttctttttccagatttgctaacttttctgtagttcctgctggtgaatacaaagctctatttagagttttgctaccatatggagacttttggaacaaagaagactggcaattgtttgaatccccagtgtttaacgtcctcgctccaccgaatgaagaaaacactactgaagagccaactgaggaatccagcgaggagcctaccgaagagtcaacgtctgagtcaactgaagagccctcttctgagtcaactgagaaatctagcgaggtgccaactgaagaaattactgaagatgcaacatccacaattgatgatgatgaagcatccaccgaaagctctactgaagaaccaagtgctcagcccaccggtccttactctgatttgactgtcggtgaggccattaccgacgttagtgtcaccagtttgaggacaactgaagcatttggatacacttccgactggttggttgtgtctttcactttcaacactactgacagagatattactctcccaccttacgctgttgtacaagtaactatcccaaatgaacttcaattcattgctcatccagaatacgccccataccttgagccctcattgcaagttttctacactaagaatgaaagattaattatgactagtcagttcaactacgacaccagagtcatcgacttcaagtttgacaatcgagaccaagtaataactcaagtggagggagttgtttatttcacgatgaaactagaacaagatttcatttctgcattggccccaggtgaatacgattttgaatttcatacatccgttgattcttatgcttcgacctttgactttattccattgattagatccgagccaatcaaattgatagcaggtgcaccagacgaagttgaatggtttattgatattccaagtgcatacagcgatttggcaacgatagatattagttctgatatcgatactaatgataatttgcagcagtacttctatgattgctcaaagctcaagtacactattggaaaagagtttgatcagtggggtaattttacagctggatcagatggtaaccaatacagcaataccaccgatgggtatgttccaattactgattctaccggctctccagtagctgaagttcaatgtttaatggaaagtatctcattgagtttcacaaatactcttgctgaggatgaagtattgagagttgttcttcactcttctgcgtttagacgtggttcattcaccatggccaacgtggtaaacgttgacattacagctggtggattggcaaaaagagaactcttctcttatatattggatgaaaattactatgctagtactggatctgaggggttggcatttgacgtatttgaagttgctgatcaggtcgaggagccaactgaggagtcaacctcagaggaatctactgaacaggaaacttccaccgaggaacctaccgaggaatcaactgaacctactgaggaatctacccaggaacctactgaagagcccaccgacgagcctacttctgagtcaactgaggaaccttctgaggagccaacttctgacgatctctcaattgacccaactgctgtacctaccgatgaacctactgaagagccaactgaggagcctacttctgagtcaactgaggaaccttctgaggagccaacttctgacgatctctcaattgacccaactgctgtacctaccgatgaacctactgaagagccaactgaggagccgacctctgagactaccgatgatccatcgatagcacctactgctgtgccaacttccgacacatcttctggacaatcggtggttactcaaaacactacagtcactcagactaccatcacttcagtctgtaatgtttgtgctgagacccctgtaacaatcacttacactgcaccagttgtgactaagccagtttcttacaccaccgttacttcagtttgccatgtatgtgcagagacaccaatcacagttaccttgacgttgccatgtgaaaccgaagacgtgacaaagactgccggccctaagactgtcacttacaccgaagtttgcaactcctgtgctgacaagcctatcacttacacctacatcgctccagagtacactcaaggtgccgaacgtacaacagttacatcggtttgcaacgtttgtgctgagacacctgtaacgctaacatacactgcgccgaaagccagtcgtcatacagttccttcacaatattcaagtgccggagagctcatttcatccaaggggatcacgattcctactgttcctgcccgtccaactggtacttatagtaagtctgttgacactagccaacgtacactcgctaccattacaaaatcttcagatgagtctaacactgttaccactactcaagccacacaagttttgagcggtgaatccagtggaattcaagctgcttcaaacagcacgagcatctcagctccaactgtcactacagctgggaacgagaactctggatctagattttcgtttgctggactattcacagttctgcctcttatcttgttcgttatataa PAS_chr1- 35atgcagtttgcttccttactgcttctcttgtatattttcttggggcaaatttatcctactgaagcagcaaaatattttgttcgtctgaagaa4_0251gcctcacacactagacctcttgttcaaacaggatgaagcagatgcatctgctgagaaccgaatctctcttcatggtttaagggaccgaatcaaaaaaaagatctcttttggaacgttcgaaggttttgttggtgaattcacaacagaacttgtagaaaaactaaaaaagaattcgttgattgcagacataactcctgacattatcgtctcatcttgcgatatcgaattgcagtcccccgctcctgatcacctggctaggttatccaaagaaggtgccgtaagagcacaagatcgtcttcttggaccggaatttttctacgatggtgactggactggagaaggcgtcaatgtatacgtgatagacacgggtatcagggtaaatctagatgaatttgagggcagagcatcatttggtgctgattttacaggcactgggaaagatgactctgttggtcatggaacccacgtagctggtcttattggctccaaaacttttggagtggccaaaaatatcaacttgatatccgtaaaagctctctctggtaatgggagcggttcgctttcagaggtcctacaggcgattgaattcgcagtcaagcatatgaaagccagtcgtaagccaggtgttgctaacttgtctctaggtgcaccaaaaaattcaatccttgaaaaagcgattgaagaggcattcaagaacggtttagtcatagtagcagcagctggcaatgccttcgtggatgcctgtaacacatcccctgcaaactctccatatgcaatcaccgttggagctataggtgatcacaacgatgaaataactagattttccaactggggagcctgtgtcgatctttttgcaggaggggacacaattgtaagtgtaggacttctcaatggagtcgctgtccgcatgtctggaacttcgatgtctgctccaatagtcgcaggcttagccggaatattacttgaccagggtgtggccccagaagatgtaaaaggtaagttaatagagctctcagatgaagggaagatcaacgataatactggaattctaaagccgggaactccaaaccgaatagccaacaatggaattcgaaaaagtgattatgaagatcaaaaagaaaatgacaatgatgaagacgatgaagacggggaagacaatctagaagacattgaagaggacgaggattattgggatgaagagagaaggtatagggaatatgcggtatctagtttagtcttctaa PAS_chr4_0874 36atgttcaacattatccaacggatacagagtttgagcaatttttatttaacggtttccattctattatgtattgttacaacagttgtctcaattattagtatgttcttggatgaaacgtccagtattcctgcccaattaagcaatgttgtaatatcaacaaatttaaagtatagcagatcgtttggttcagtcggtggtagacctaaagaaaactccaagattttatttgatcttgatatggatctggctccattattcaattggaatactaaacaactgtttgtacaattggtagcagagtaccctacctctgttgccgatgatggtgcgaaggtgacctattgggatagcataattactgagaaaaagtacgcaagagtgcatgtcaataagcagaggggaaaatactcagtttgggacgtgtcggactcctttcaaggccgcaatgctacggttaaactgaaatggaacttacagccctatgtcggctttctattctttggacaaactaagggagagattgaggtggcctatcctgcaacataaPAS_chr3_0513 37atgagtgtcatagtgcatcctcttgcactattgacaataatcgacgagttccagagacgaggtcgcaacaacgattccataatattcggtgggttacttggtaaacatgatgaatccaccaaccaaatatctgttgttaacagctttgtgataccattgatcgataatcagtttttgaataaagagtacttgcaggacatgctactcaaattttctatcattaattccaactttcgattcgtaggttactatcacgttcaatctttaaacggtaccgaaactcaacagtatgacttgaacgctattaacctagtatgccaagatgataataggccttcgtcctttgtccattggatagtaacagatccaaaagagttcaaatcattctcgatgtattacttggatgattcaatggttcaactcgtcaattccaatattcaacattacatttctaaaccattgccctatgaatttaaaaaccttctgtctgagaaaattgctatcgacacaatcctcaagcaatccaggctagaaaaagacttatccaccaaaaactcactgaagaaattaaacaatagttatatcgacattcattcctcactgaacgttctctataaatcagtcaataggcttattcgttacctcaaaaaatgctcaaaatcagaagtttcaattgactatgacacagttcaggaaatgaatactgtaatactgaaaattgaaaggcttaaattgataccccaagtcaaggaggagtttgacttagtgactctttcactactggtagacaatcttgatcagatggatcatcttttgtatctccggaaacaagtggaacagtacaaaatatctgaatcaatgtatagttag PAS_chr1- 38atgaaatttcactcgattgtcttcacattttcactcgttttgagttcactggcgttgtcgataccatgggtgtctgaccacatggtccagca1_0127tctttttgccgacccttcaatcagtaaaggtcctgatgtagatctcgttgggctacataagcatttggtcagcatcaaatctctttcgggctatgaacaagaagtagtatcgtggttggccgattatctagccagtaggggtcttactgtggagttgaacaaggtcgaggacgaaactgaacgttacaatttgtatgcttatttgggaaccacccgcaacactaaggttgtgctaacttctcacttagacacagttcccccttatcttccctacaaagttgaggaaggtggctatatctttggcagaggaagctgtgatgctaagggatcagttgcggcacaagtgattgccttcctaaatctcttggaagagggctccatcaaagaaggtgatgtcagtcttttgtacgtcgttggtgaagagattggaggtgatggaatgcgcacagctagcaagaccttgggtgctaaatgggacactgccatttttggagaacctaccgagaacaagcttgccattggacacaagggaattgcactgtttgacctgaagattacaggaaaatcctgtcattctggataccctgagctgggaattgatgccgacgctatgttggtccagattttgcacaagttgctttttgagacttcttggcctgtcagtgatttgctgggaaactccacagtcaacgcgggacagatcaacggaggagtagctgctaatgttatttcttcggaagcacatgccaaggttttaatccgcgtggctaaagacattgacgctgtagagaagctgatctacgaggccattgcccccttcgaggagtatacagacattacctttcactccaaagaagatgctactttcttggattacaaggttgaagggttcgagaactacattgcagcctacagtaccgatgtaccattcctagtgacgggctccaatttgaccagatatttgtacggaccaggaagcatcatggtggctcatgggcctgatgaaatggtcaaggtttcagacctgcaggatagtgttgacggatacaagcgattagtctccgtctcactttagPAS_chr4_0686 39atgccagagaaaaagaaacaaaaaaaagagtcgacatctccattcaagggtaacctagttgggatctcattggtagctgtggcattgtttgccatctaccagtacctctacccaagctcgttttcctctcagcctgaaaccccagccccagttttcgatctgagcagtgaattagaagcattgtgtcccgtgtaccctgcagtcagatcttccgacttcgaaaaggatcgccccatcttagagagaattctgaacgatccctcatttagaatcgcttctgctcaaaaactgagtaaggctgttcagatcgatacccaagtgttcgacgaacaattggacgtggctcaagaccctgaagtttggaccaaattcgtcaagttccatgaatatttggaggcaactttccccaccgtttactcccaattgaaggtcgacaaaatcaacacctatggcttggttttcacttgggaaggctcagaccctagtctgaaaccactcatgttcttggctcaccaagacgtggttccagtccagaaagatactcttcaggattggtcatatccccctttcgaaggacgtatcgccgatgacagagtttggggacgtggatcagctgattgcaagagtttactgattgcattactggaaaccgtagaattgctggtagatgaagggtactcaccaaagagaggtgtcatcctcgcatttggattcgacgaagaagcttcaggtacctacggtgctcacaatatctccaagtttttgcttgagaaatatgggccagatagtattgccctcattttggatgaaggtgaggctgtcagttacgtggacaagaaacaaactaccctcgttgcaaagattgctacgcaggaaaagggttaccttgacctagaggtcgcattgaccactgtaggaggccattcttctgtcccccctaagcacactgcaattggccttatttccaagttggtcacacatatcgaagatcatccattggacccagaaattagtaccagaaatcctctggtacagttttcgaactgtcttggtgcagctggggctttgagagatgacttcaagactgctcttgttgcatacagcaaggatccgtcgaacaacattgtcaaacaaggtgtgattaaaggtatttccaagattgcatttttcttcggttctttgattaccacaacacaagccaccgatcttattttcggtggagagaagatcaatgctttgcctgaaagtgctagagtagttatcaaccatagagtggacgttgagcgtgattcagcccaaatcatagacagattgattcacttccacgttgttcctattgccaaggagcacggtttcaaggtcacttacagtgactatggtagtgacaaagttgaaactgtctacgagccagaaggagttgcctcattgggagaattccacgtttctcctttctccagagtctgggagcctgctccagaatctccatccgacgacaatgtctggtccatcatttctggtaccactcgtacgatatttgaggagtttgtggacccctcggctaaacttattgcaagtccatacatgatgcctggtaacaccgacactcgacactactggccgctgacaaagaatatctatagatacgttccaggtattgtagatatttacaaggctaagatacactcggtagatgaatctaccgaggttgatgcccacttgcaagttatagctttctaccacgagttcatcaaggttgccagcgaatgggagctttga PAS_chr2- 40atgaaatcctctaaagaactatacaaggaggctctcaactatgaatactcttccgcggtttctttcaaggcctgggttcgaagtgctcaaat2_0056cattttgcgacatgcccggcagtttgctgaacaaagatacatcagtgagtgctataagttgtctgttcgttttgtagacttgattgtgaacaagatggccacgcataaagagctcaagcaattgaagaaaataaatgcaccagtatatctcacctatttggatttggctacgaagaaagtcccagatgtcatcaaggaatgtgaggccttgaagacaattttggatgatgagtaccaaagctacctcaaactgcaacaattgaaacgacagaagcagaaagaccaattgatccatcatcagaatcaggctcaaacgcataaattacgtagatcttcatcaatattgaaagatcatatcaacgctgttgatgaaagagcgctgttgaaacaactacagcagttgacataccatgatcgtgaattcgcaaccgcaataacggagatgccaaattatccagagatcccccagctgagtatttcaacgaatcagaacactagatcagaggcacccccacttccaccaagagtatcgcaggaacagtcattagcaccagtatcactagattcatcacaggcagatttacaacacaaaactgttaacttcaccgaagctgggcaaccattacgaacagtatttatttcagatagactccaatctgagttccttagactagcggaaccaaacacgatacaaaagctagagacttgtggcatcctttgtggaaagctcgtcagaaatgcattcttcatcacccatttggttataccagatcaagagtcgacaccaaacacatgtaatacaagaaatgaggaaaagttattcgacactatagatcagcttgatttatttgtccttggatggatacatacccacccaacacaatcatgcttcctgtcttccatagacttacatacacagaattcgtaccagatcatgttaagcgaagcaattgccattgtgtgtgcaccagcacctcagttttctcatcattcttttggatgttttcggctaacccatcctccgggaattccaaccattacacaatgcactaggacgggatttcatcctcatgaggaacccaatctgtatgtgacttgtaatcgaaagaacatgggcgacgtgcaaggcggacacgttgtgatcaagaatcatttaccgtttgaaaagcttgatctaagataaPAS_chr2- 41atgactagttctgtagataaagtgagtcagaaggtcgctgacgtaaaactgggctcctccaagtcaacaaagaataacaagagcaaaggtaa2_0159aggaaaatccaacaagaatcaagtggttgaggatgatgatgaggatgattttgaaaaggccttggagcttgcaatgcaattagatgcacaaaaactagctcagaaaaaagctgatgatgtgcctcttgttgaagaagaagagaaaaaagttgaggaaaagattgaacagcaatatgaccccatttccactttttaccctgatggaaactatccccaaggagaagttgtggattacaaagatgacaacttgtaccgtactactgatgaagaaaagcgagctttggatcgagagaagaataacaagtggaatgaatttcgtaaaggtgctgaaattcataggagagttcgaaaactggcaaaggatgagatcaaaccgggaatgtcaatgatcgagatcgccgaactaatcgaaaacgcagttcgtggatatagtggtgaagacggactcaagggtggtatgggatttccttgtggtctttctttgaaccattgtgctgcgcactattctcctaatgctaacgacaaacttgtcttaaattatgaagacgtcatgaaagtagattttggtgtccatgtgaacggtcacattatcgatagtgcattcacgttaacattcgatgacaaatatgatgatctgttgaaagctgtcaaggatgctaccaatactggtattcgtgaagcaggtattgatgtgagattgaccgacattggtgaagccatccaagaagtaatggagtcctacgaagttactttagacggagaaacataccaagttaaacctatcaagaatctttgtggccataacatcggccagtatagaattcatggtggtaagtctgttcccatagtgaagaattttgacaacaccaagatggaggaaggtgaaacctttgcaattgaaacctttggcagtacaggaaggggtcatgtgataggacaaggtgaatgctctcactacgccaagaatccagatgcccccgccaatgctatctccagcattcgtgtgaaccgtgctaaacaattgctaaagactatcgatgagaactttggtactcttccattctgtcgtcgctacatagatcgtcttggagaagaaaagtacttattggcattgaaccagttggttaaatctggagttgttagcgattatccacccttggtagatgtcaaggggtcatacactgcccaatacgagcacaccatccttttgagacctaatgttaaggaagttgtatcccgcggtgaagactactagPAS_chr3_0388 42atgattcacagctgtgctagtgctgagtgctcaaaagcgactgaatctaccttaaaatgtcccttgtgtctaaaacaaggtcagatccaatatttttgtaaccaaaaatgtttcaagaatggatggaagatccacaaagcggttcacgccaaagatggtgatatagatggttcgtacaacccctttcccaactttgcctacaccggtgagctcagaccagcatatcccttgtctgtgagacgagaggttccagagaacattactctcccagattatgctcttgatggagtaccagtctcagaaatcaaaaataacagaatgaacaagatcaatttggtaacggagccagaagacctggccaagctaaaaaatgtttgccgtttagcacgagaggttctagatgctgcggctgcatctatcaaaccaggagttaccactgatgagatagatgaaatcgttcatagtgaaacaatcaagagagaagcatacccctcccctttaaattacttcaattttcccaaatctgtttgcacatccgttaatgaagtcatctgccacggtatacctgatcgtagaccgctccaggatggtgacatcgtgaacctggatgttaccctttataaagatggatttcatgcagatctgaatgaaacgtactatgttggagagaaggccaagactaacaaagatctggtcaacctcgtcgagacaaccagagaagctcttgctgaagctatccgtttagtgaaacccggcatgccgttccgtcaaattggtactgttatcgaaaactatgtgactgaaagaggctgtgaaactgttcgttcttacactggtcatggtatcaatactttgttccacactgaaccaaccattccgcattacgctcgtaacaaagctgttggagtagccaaaccaggagtggtattcactatcgaaccaatgttgactctgggcactcatcgtgacgtggtttggcccgacaactggaccgccgttaccgctgatggaggaccaagtgcccaatttgaacatacccttttggttacggaagatggtgtggagattctcactggcagaacggaaacttcgccaggcggtgccatctcaagactataa PAS_chr3_0419 43atgctctataagaccaccttgtcaatagcacacacgagtgtgatattgttgtcattgataaccgccataagttgctttgagttgcatcttcctcagaaggtttctcatatagtagacagtttacaatatacttgcggccaatttttgcaaaagcagcagatctttgcactctataacaagcaaaatttcaccgaaatagtgaaccagaatatcaagggaatagaggagagagttttgtctgagttgcttgaagaaagattagagaatgaatcccagaatgattattataccgccaattctcaaaattggcctatcgacttggatcagtactcagaatcatttgtaataaggatcacatctgaagatgagtttatcaagtacttgatcttcaaggaagctaaagctttgcatatttccatatgggagcaatctgttggtttgatagatttgaaggttgaccgtgatcagatgcaccgcctactttacaacgtggagtcacgcatactggaacgaagaacgagaagtgttgacagtccagtttctgaatataaagtacaattgatgattggagatcttccacagcgaatctacgaaacatatccttcgacaaaagtgacatctttgcaagccctaggagagttcccttctttccagaacctaagtaatgctttttttgaggattttagaacgctggaaactatatacgactggttcgaagaaatacagaaggaatttcctaagctagtgtcgatcaactggattgggcaaacttatgaaggtcgtgatctgaaggctcttcacgttagagggaagcactctggcaacaaaacagtagtcgttacaggtggaatgcatgcgcgtgaatggatatcagtaaccagtgcatgctatgccgttcacaaactgctccaaaactatgctgacggacaccacaaggaagcgaaatacctggacaagttggactttttgtttgttccagttttgaatcctgatggatacgaatatagctttaacgaagacaggttgtggaggaagaacagacaagaaacttatatgccccgatgttttggtatagacattgaccattcatttgattatcatttcgtgaaatcagaagacttaccctgtggagaggaatattcgggtgagtcccctttcgaaagtatagaaagtgaagtgtggaataatttcctgaacagaaccaaagaagaacataagatctacggctatatcgacttacactcgtattcgcaaacggtgctgtatccctatgcgtactcatgcgaaatcttaccaagggacgaggaaaacctgattgagctaggttacggtattgcaagggccataagaaagagtacagggaaaaaatatcaagtgttgaaggcatgcgaagacagggatgcagatctattgcctgatttgggaggaggaaccgctttagattatatgtaccacaaccgtgcatactgggcgtttcagatcaaattgagggattccggtaatcatggctttctccttcccaaaaagtttatatacccagttggaacagaggtttatgcctcaattcagtacttttgttcttttgtgctgaatttagaaggctaa PAS_chr1- 44atgaaattgaccataacattagcccataacgatcaaatcttggacattgatgtgtccagtgaaatgctactatctgacctcaaagtcctgtt3_0258ggagttggaaacttccgtacttaaaaacgaccaacaattattttacaataacaacctgctcactggagatgactcgccactggaagatttaggactcaaagataatgaactcataattctgagcaaagtcgaagcacatagtgatgtcaattcacacttgaactctgttagagaacagttgatacaaaacccgctataccaggccagtttacctccaagtcttagagataagctcgacgaccctcaaggcttcaaagaagaagtggaaaaactaatccaattggggcagtttggacaatacgggccttcccgtacttccgtccaacaggaattagacagactacaaagagatcctgacaatccacaaaatcagaaacgaattatggagctcattaacgaacaagctatagaggaaaatatgaatactgcttttgaaatctcacctgaatctttcgtttccgtgaatatgctctatataaatgtggaaattaatggtgtccattgtaaagcattcgtcgatagtggagcccaaacgaccataatgtcccctaaactcgcagagaaatgcaaccttgcgaatctaattgataaaaggttccgaggagtcgcacagggtgtaggaagttctgaaatcattggtcgtatccattctgctcccataaaaatcgaagatattattgttccctgctcattcactgttttggataccaaggttgaccttctattcggacttgatatgttgagaagacatcagtgtgtgattgaccttaagaacaactgtttacaaattgcagacagaaagacagaatttttaggagaagcagacatcccaaaggaattctttaaccaaccaatggaagctccatccacagctcctgtcccaaaacctgtacaacctcctcaacaactcggtcagcggccggctggaagccctccctccacaattcaaagaccagcagtacaaccgccacctgtggatatacctccagaaaaaatccagcagttgatcaaccttggattcggagaagaggagtcgaaagaagcacttattagatctagaggaaatgtggaagttgcagcggctttgttattcaactagPAS_chr4_0913 45atgccaaaccttccttctagcttgaacaagatgactgctcaagccgtgaaatacgcaaacggtatgtcatctgccctctcccgtgtttgagactctatccactaactttagattttatcaccttcctgaacaattcacctactccataccatgctgtcgactccgtaaagtccaaattggtagagtcggggtttaacgagctcagtgagagagttaattgggccggaaaagtcaagaagaatggcgcttactttgtgactcgtaacaattcgtccattatagccttcactgttggcgggcactggcagccaggtaacggagtgtcaattgttggagcccatactgattccccaaccttgagaatcaaacccatatcccattcgactaaggagggatttaaccaagttggaattgaaacttatggtggaggcttgtggcatacgtggtttgacagagatttaggagtagctggacgagtgtttattgaagaagaagaatctggtaacattgtgtccaagttagtcaagatcgataaaccagtattgagaatccccacactagccatacaccttaccaaagagagagctaagtttgagtttaataaggaaactcaattccatccaatctcatcgcttgaaaactcctctgaaaaggagaaaaacaaagatgaggaacatgacgcttgtgcaggagaagatttgactacggaggagtttaagtcaattcaatctgttgtggagagacacaacaaacaattgcttgatctggtggctgcagatcttgattgctctatatcccagatagtggactttgaattgattcttttcgaccacaacaaaccagtactcggaggtttgaatgaagaatttgtgttctcaggaagattggacaacctaacttcttgtttctgtgccactgaagcgcttataaatgccagtaaagataccaacaggttagatctggatactaatattcaactgatctctctgtttgaccacgaagagattggatcagtttctgctcaaggagctgattcttcatttctacctgacatacttcagcgtataacaagactaactggtaatgaggttagcaccgatctggaaggacaaccaaattctttctttttagagtcaatggccaaatctttcctactatcttcagatatggcacatggtgtgcatcccaactatggggaagtctatgagaagctaaataggccaagaatcaacgagggaccagtgatcaaaataaacgctaatcaaaggtacagcaccaattccccaggtattgttttgctcaagaagattggtgagttgggaaaggtccccttgcaattgtttgttgttagaaacgactctccctgtgggtcaacaattggtccaatgttgagtgctaaacttggacttcgaacgctggacctcgggaacccccagctctccatgcattctatcagagaaactggaggtgctcgtgacgttaaaaagttggtcgatcttttcgaaagctattttgagaattattacaccttggagcctaagattaaggtataaPAS_chr1- 46atgaacaaaggtccgaaagaattggagggccgcaagtatccagcaagagcccatgcactgacggtcaaaaatcactttatccaaaagaaggc1_0066tgacatttcaagtcgttctgcaatctttattagtggcgaagatctcaagttgtatccttactgtgaccaaacagctcctctcagacagaatcgttatttcttttatctgtcaggttgtaatatccctggatcccatgtcctttttgacttggacgccgaattgttaattctggtgctaccagaaattgattgggatgatgtcatgtggagtgggatgcctctttcgattgaagatgcctacaagacgtttgatgtggacaaggtggtatatcttaaagatttgcaaggctttttgtcgtcgtttggaaaaatatatacaactgacatcaatgatgaaaattctaagtttggcaatctactaacagagaaagatcctgacttgttctgggctctggatgaatccagattgatcaaagacgactatgaactcactctaatgagacatgcgtcaaaaatttctgacaattcccattacgctgtcatgtcggctcttccaattgaaactgacgaaggccatattcacgctgagtttgtttatcattcgttaagacagggatctaaatttcaaagttatgacccgatttgttgcagtggaccaaactgtagtacccttcattatgttaagaatgacgattctatggagaataaacacaccgttctaatcgatgctggtgcagaatggaacaactatgctagtgacgttacaagatgttttcccatcaatggagattggacgaaagagcatcttgagatctataatgctgttttggatatgcaggaccaagttatgaagaagattaagcctgaagcccattgggatgagctacaccttttggcacatcgtgttctcattaagcattttttgagcctcggcatatttcataacggaacagaggatgagatatttgagagtggagtctcagtatcattctttcctcatgggctgggtcaccttttaggaatggatactcatgatgttggtgggcaccccaactatgatgatccaaaccctctattgagatacctaagattgagaagagtgttgaaagaaaatatggtagttacgaacgaacctggaatctacttctctccctatcttgttgaattgggactgaaggatgataataaggcaaaatatgtcaacaaggatgtactggaaaagtattggtatgtcggaggtgtgagaattgaagacgatattcttgttacgaaagatgggtatgaaaacttcaccaagattactagcgaccccgaagaaatttccaaaatcgttaaaaaggggttggagaagggtaaagacgggttccataatgttgtatga PAS_chr2- 47atgacatctcggacagctgagaacccgttcgatatagagcttcaagagaatctaagtccacgttcttccaattcgtccatattggaaaacat2_0310taatgagtatgctagaagacatcgcaatgattcgctttcccaagaatgtgataatgaagatgagaacgaaaatctcaattatactgataacttggccaagttttcaaagtctggagtatcaagaaagagctgtatgctaatatttggtatttgctttgttatctggctgtttctctttgaccttgtatgcgagggacaatcgattttccaatttgaacgagtacgttccagattcaaacagccacggaactgcttctgccaccacgtctaatcgttgaaccaaaacagactgaattacctgaaagcaaagattctaacactgattatcaaaaaggagctaaattgagccttagcggctggagatcaggtctgtacaatgtctatccaaaactgatctctcgtggtgaagatgacatatactatgaacacagttttcatcgtatagatgaaaagaggattacagactctcaacacggtcgaactgtatttaactatgagaaaattgaagtaaatggaatcacgtatacagtgtcatttgtcaccatttctccttacgattctgccaaattcttagtcgcatgcgactatgaaaaacactggagacattctacgtttgcaaaatatttcatatatgataaggaaagcgaccaagaggatagctttgtacctgtctacgatgacaaggcattgagcttcgttgaatggtcgccctcaggtgatcatgtagtattcgtttttgaaaacaatgtatacctcaaacaactctcaactttagaggttaagcaggtaacttttgatggtgatgagagtatttacaatggtaagcctgactggatctatgaagaggaagttttaagtagcgacagagccatatggtggaatgacgatggatcgtactttacgttcttgagacttgatgacagcaatgtcccaaccttcaacttgcagcatttttttgaagaaacaggctctgtgtcgaaatatccggtcattgatcgattgaaatatccaaaaccaggatttgacaaccccctggtttctttgtttagttacaacgttgccaagcaaaagttagaaaagctaaatattggagcagcagtttctttgggagaagacttcgtgctttacagtttaaaatggatagacaattcttttttcttgtcgaagttcacagaccgcacttcgaaaaaaatggaagttactctagtggacattgaagccaattctgcttcggtggtgagaaaacatgatgcaactgagtataacggctggttcactggagaattttctgtttatcctgtcgttggagataccattggttacattgatgtaatctattatgaggactacgatcacttggcttattatccagactgcacatccgataagtatattgtgcttacagatggttcatggaatgttgttggacctggagttttagaagtgcttgaagatagagtctactttatcggcaccaaagaatcatcaatggaacatcacttgtattatacatcattaacgggacccaaggttaaggctgttatggatatcaaagaacctgggtactttgatgtaaacattaagggaaaatatgctttactatcttacagaggccccaaactcccataccagaaatttattgatctttctgaccctagtacaacaagtcttgatgacattttatcgtctaatagaggaattgtcgaggttagtttagcaactcacagcgttcctgtttctacctatactaatgtaacacttgaggacggcgtcacactgaacatgattgaagtgttgcctgccaattttaatcctagcaagaagtacccactgttggtcaacatttatggtggaccgggctcccagaagttagatgtgcagttcaacattgggtttgagcatattatttcttcgtcactggatgcaatagtgctttacatagatccgagaggtactggaggtaaaagctgggcttttaaatcttacgctacagagaaaataggctactgggaaccacgagacatcactgcagtagtttccaagtggatttcagatcactcatttgtgaatcctgacaaaactgcgatatgggggtggtcttacggtgggttcactacgcttaagacattggaatatgattctggagaggttttcaaatatggtatggctgttgctccagtaactaattggcttttgtatgactccatctacactgaaagatacatgaaccttccaaaggacaatgttgaaggctacagtgaacacagcgtcattaagaaggtttccaattttaagaatgtaaaccgattcttggtttgtcacgggactactgatgataacgtgcattttcagaacacactaaccttactggaccagttcaatattaatggtgttgtgaattacgatcttcaggtgtatcccgacagtgaacatagcattgcccatcacaacgcaaataaagtgatctacgagaggttattcaagtggttagagcgggcatttaacgatagatttttgtaa PAS_chr1- 48atgacctgccaaagtgtagaagagctggatgctattgttgaatcaaagcttagggaggttgataataaagtttcgaacggaaatgttgactt3_0261catcaaacaatatctgattcaggcgatgaactattatgacaagtatagatctgaaatcaaaaaaattggacccacagaaaagaaccctaaatactattgttttcaagaggcagcgtatgttaactacaaagcttcccaagctttactaagagagagaatacccaagctgcctggctttggaggatataaatctgcgtattcaaaaatctatcgtgaactgatagaaatggtagaggggcaagaacatgagattgcccagataaaaagcggcttaaggaaaaacttttgtgatgatacattagttcttcgactgagaagtttaaaatcaccatctgctactcagcccaaaagtttaccggattctacacccacttcacaatttaaaccaaaaccttcaaagccttttagtatcacaatcaatgaggaatacatttcggttgaccaattgtcacgccttcttaaaacgaacccgaatgacatactcctcattgatctacggtctcgtcaagagtacgacgtgtatcacattgaagatggctccggggtggacatgtcaatatgtatagaaccaatgagtatcagaaacggatacacagcagaggatctttatcaactttcaatggccgtcaatccagattatgaaaggagattgttcaagaatcggtctcagtatgaactgttggtatgttatggtaattatgacaacgaggctactgttcaaatgttcatgactatcatgaataaagatacttccctcaagaggcggagcgtctatttgaaatccggaattaagggctggaatcaggatctgagttttcaagattcgaaaccgaatgggtacttaactagtacgactgactacttcagtaacactccgaaacacacaattacgcccaaatcatcaaaatcaagttcaaaacctactttaaaaactactgtcaactctgggcctgcccacactgttgggatcaataatctaggaaatacatgttacatgaattgcatacttcaatgcctattagaaagtgataagtttgtttcattttttttacaaggcgattataagaaacatatcaatattaatagccgattaggctcgagaggtatattggctacaggatttcatttgttagtgctattaatatccagatcatctggtaaaacagtgactccttcttcatttgccaaagatgtttcaacagtgaataagaattttaagttaggagagcaacaggattgttttgaatttttagattttctcctggatagtttacatgaagacctgaatgaatgtgggaatgaaccaccaatcgcagaactcacacctgaagaggaaaagcttagggaagctttacctatcaggattgcttcgaccattgaatgggaaaggtatttaaaaaacaattttagcatagtagaagatgtgtttcaagggcagtacttctccagattggaatgtacagtctgtaaaagcacttcaactacttataactcattcagttcactgtccttgccaatcccattagatcgacaaaatgtcacactagatgactgtttccaggctttttgttctgtagaagaattgaacggagatgacagatggcattgtccaagctgtaaaaaaaagcaggtcgcttttaagaaacttggtatctctagactaccaagtgttctgatcgttcactttaaaaggtttcaggtcaagtgggaaacaggtcatataatcaagatagacaagtttatcagttatccgttcaagctatcaatggacaaatattggcccaaagctcaatcagaagaagaactaagaaacttggagaagctaccatcgagaaatcagaatccccctttcaattatcgattgacaggggtggctaatcattttgggaccagaacatcatctggtcactacacatcatatgttcaaaaaggtggccaatggtattactttgacgatagtgctgtgactagcaatgttgatcgtcataaaatcgtaaatgggaacgcctatgttttattttatcgacgtagttag PAS_chr2- 49atggaagccgtgaatttacaaattgaatggattagacaggtgcctccagttactgtggctcttgtagcatccatgtcaatgacctatttttt1_0546gcaacgcatagatgtattatcctcaaatatgttcgtgtttgaaagacatcgtgtgtttaatgagatggcctattctcgtttgatactaagtttcttcttcagcgcccattcgtttgttggattcttttggacattgtacacattatttcagaattcacaggcactcgagctgacctatgaaaactcaatcgattacctctactcattggtgataatagcaggtttgatcgtggcatgggcctcatacttggggggtccgttcatgctgggatgggttctagctgacgtcttgagaaccatatggtgcaaacagaatcccaacgaaagaatgtctattttggggctagtttccttcaaggcaggatactttccatttgtaatacttgccatttcatggctagaaggaagttcaagaaatcttctattaatgctaattagccaaactgtcagtcaggcttatatttttggacaccatatgatgcccgaactacacgggatcgatctgtttctgcctatatggaaattccagtgtttcagacgtcagagacaaccaccaattcatcagcatcaagactaa PAS_chr2- 50atgtcaaaggtggtggtattcctaaatggattattggcaataacctttacgtttgaacttctctctgttttaagcgtgccaatcaccaagca2_0398tatccaactttgttcttatcaaggatataagtttggcgtgtttggatattgcaccgagaataatatctgcacaacgataggaatcggttatcatcgaaattcaatagacgaattgagaggcttttcattaccaagtaatgcaagaagctctatatcaagcttgttggtggttcatttgattggctgtgtttgcacctttattttatgggttctaagtctcatgttgaatatggatagatttcacagatcattatggttcttattaacgtgtctagtatggacttgtgctttctttttttttacattattctccttcctggtagacgtgttactatttgtgccacacgttgcgtttggaggttggttgatgttggtaagtactgtatttttggcatttacaggaaccattttttgcatcatgcgaagaactgtcagctcaagaaaaactcatttgaagaactacaacgggggaagtacaagtttgatgcggctgcagacgtatatctccaatagctctagaggaagctctgtaaccaatgatgaatacgtctggtttcaagaaactccattacaagacctctaccccccagacaatcccaattacgacgacatctacggaacgactgaacacgaactaacccgcttggacacaatatctcttgaaaggccaagaataggccttatcacaaacgaaaatgccagcggcgatggtggggtagtttccccaccacagaatgacagtacacttctggaatcttcgggcagaattaggaatgggccactgggagaccgaagtgaatttcccaacggatcaacaagcgaactttctgcataa PAS_chr4_0835 51atgaaatacagtgaccaattaatagaagagtacaaagaattatggttaacagcgacatctaatgagcttactagagaatggtgccagggaactctccacctgagcaaattatacgtttacttgacacaagacttaaagtattttggggatggatttcgacttttaggcaaaaccatttcgttatgtcgccgtaggcaatcgcttgtgtcattaggcaaacatgtggggatgctcagtaatagtgagaacacgtacttcgtggattgtattaacgatcttactgaacagttattaagagatgggatgtacaatgctgaagaattagaagaaatcagtggtttaacgttacctgccgtggaaaggtaccttttattcatgagatcgatggtagagtcttctacaataacttatgcagaaatgattactgtgatgtttgtaatggaacaagtctatctggattggtcaaataatggactgagaagtaaacctgacaacttgcattggtggttcaatgaatggattgatatacatagtggggagaactttgaaagctggtgccagtttttaaaggatgaggtagaccgctgtatacaggagttgaaggatgctaatagagatgatctcgtggcgagggttgaggagatttttagagaaacattagaacttgaagtcgaattctttaaaagttgttacgatatcacggacgatgaatgaPAS_chr1- 52atgcactcgaaatttaggtgggtatgtgtcgatactcaattctgcacacaccaccaaaatctgtcgcctttctcttatatctccaacccgag1_0491tccaatgtcattttcttaccttgaaggcaacatcgattttaaaggacaggaacttgcaaacaggatcactaaaaaactaatcacatttggtgcaattattagttttctggtaggatttttgagtgacaacatcttatacactgtatacactttcgcagcttttggtttattgactgcttctttggttattcccccttttagcttctacaaaaagaaccctgtaacatggttaccaaagaaatccaaaatagagattcagcattgaPAS_chr2- 53atgacagactctgttaactctgatgattctgatctggaaatcatagaggtgactgagcctactccaaaagtggaccttttggcccccaatcc1_0447agcatttaattttactgcccccataagcaacagtaacggcacaactccaataaggagaaaacttgatgaccaatccaactccaattcttttgccagactggaatcgttacgggaatcatcagtgaaaccacaagctagtacgttcaatagtagtaggttcatcccccaagccgaccaattttccaataatcagaataatgaacttgataacaacaatggattcgccgactggatttctaagtcccaacctgaatttccctttccacttaatgatggaccaaaaaagtccagcaatcaacctacaaactcaaattttgaagagatcatcgatttaactgaagatatcgagataaatacatctgtccccgcatctacatcatcttctaccccagttccctccagcacacagaatcagagccatcatatagccaacaacaacacagcacaagatgcgcatatcttccaagggaaacgacctctccaatcatattcagatgatgaagacgaagatttgcaaattgtaggatccaatattgttcagcagcctctaggaattatgccaggaactttcaacgcccctgcaaacatactccattttgacggttcaaaccagaatgaacaagccagatggctggacttgcggataaaagatttgttagataatcttcacaatcttcgagttcatgctcagtcgaatattatggagatcaataggttcatttccactttggggcatttaaacagagaagtttcagagctcaatctaagatatcaatctatcgtgaacaatcctcaggcgaccgctaataatcaaggatacctcactcagcttttgaacaggattcaggagcttactaatgaaaaagcgcacatatttagagagatggatacatccaagataaaacagcaggagattcacagaagaatccatgctctctcgtcaacaattgacaaactgaaaaaagatcgtgaacttatctttcgaaatgctcaaaatgcttttcacggtgatatgaagaatgaagttttggaaggccagtctttcatggatgcaattcatagggcaaatagcttgggttatgcttcaaatatttattctcgttctgatgaagacgctggaagcttacaacggcttcttgaaaatatccagcccgatatggaggacaaagacgatgatgaattggctaaaactccgaaggagttcaatattcaactgctgaagcatcagagagttgggttagattggctacttcggatggagaagtcaaccaacaaaggaggcattttagcagatgccatgggcctgggaaaaaccatccaggctattagtattatttacgcaaacaaatggaaaacacaagaagaagccgaagaggaggcaaaacttgaagagaaggttagatccgaaaagtctacatcagaaacgaatggagaggtcagcaaaacgtcaacggcaaagtcggaaaagaaacccatccaaggagacgaaggatatttcaaaactacgttaataatagcaccagtttctcttctacatcagtgggagtctgaaatcttgttaaagacgaaaccagaatacaggctaaaagttttcatttatcacaagcaaaaaatgtcctcgtttgaagagctccaacagtatgatatagtattaacatcgtatggaactctgtcttctcaaatgaagaagcattttgaagaggcaattaaggaggcagacctacagcccaactcttcatccataccagcagaagactctggaggcatatctttcaagtcaccattttttgcaaaagaaacaaagtttcttcgagtcattctagacgaagcccataagatcaaaggaaaaaatacaatcacttcgaaggcagtcgctttggtgaagtctaaatacagatggtgtttaacgggcacaccgctacaaaataaaattgaagaactatggcctctacttcgattcttgagaattaagccatattatgatgaaaagcgatttagaactggcatagtattacctataaagagttccatgtcaggcaaatatgattccacagacaagaagattgctatgaggaaacttcatgccctacttaaagcaatcttgttgaaacgaaacaaagattcgaagattgatggagagcccattctcaagttacccaagaagcatatcattgacacattcatagaaatggaagcaaaagagttagacttttacaaggatctggaaggacagacagccaaaaaagccgaaaagatgctaaacgctggaaagggacaaggaaatcattattctggtattcttatcttgctattgagactgagacaaacttgttgccaccatttcctcgtgaagttatctgagatgaagcaagaagccaaattgaaacaggaagttgctaccaagatgccacaattggccacacaactatctcctgctgtggtaaggagaattaacattgaagcagaggccggatttacgtgtcctatatgtttggataacatcataaatgagaatgcttgtatattatacaaatgtggacatgttgtttgtcaagattgcaaagacgatttcttcaccaattatcaagagaatgaaactgatgacggtcttagagtgtccaaatgtgtgacctgtcgtttgcctgtcaacgaaagcaatgtaatcagtttcccagtctacgacaagattgtgaaccagcatatttcagtgatggatatagttaaaagtgagtctccagtgttgtcaaaaattgaaatgattcaacaactgatccgggagaacaaaggcgtcttcgaatcgtctgccaagatcgataaagcagtggaaatgatacaagagttactgagagacaatccaggggagaagatcatagtttttagtcaattcacaactctcttcgatgtcatagaggtaatactcaaagagaacaacattaaattcattagatatgacgggtcaatgtctcttagcaatagagatgctgccattcaagagttttatgagagtacggagaaaaacgtaatgcttctttctttgaaagcagggaacgtggggttgacattgacttgcgcctcccgtgtcataataatggacccattttggaacccatatgtggaagaccaggccatggatagagcccatagaattggccagttaagagaagttttcgtctatcgaatgttgatcaagaacaccgtcgaagatagaattttgaccattcaaaatacgaaaagagaaatagttgaaaacgctctggataaccagagtttgaatacgatatccaagcttggcaggaacgagttggctttcttatttggtatcggcaattga PAS_chr1- 54atggagtgtaaaaaagtcaaagatcgcctagtcacggaatacttaaagattgaatgtagtcgacttaaccgaaggatacgctccctgaaaaa3_0053tccaaaagttgagcaagccctactgcaattcaagaactcacgtttggctcacatgagaaaggctcatctggatggaataagaaacccacagtatacggatgacgccatctttcaggcattggaaaccatggatttggaccacatatttgagaaggcaggtagtctttacaactcacagcaacaagatgaatcaaaaaaagattccctggatgaaacagatttcaccgtggtggcgttgctagattggttcaagaatgacttcttcaaatgggtaaacaagccaccttgtcctgtttgccatagtgaagatgaaagccgcataagaatggtcggatctgcaaggcccactagtgaagaattgtcgtacggagcaggggtcgtagaggtgtttaattgtgaccattgtagctgtgcaatcagatttccaagatataacgaccctaagaagctcctgagaactagagctggacgatgtggggaatggaataactgttttctgttgtgtctaaaagccttgggtctgaaagctagatgtgtgaggaatgtggaagatcatgtatggagtgaatactactcggaacatctcaagcggtgggtccatctggatagttgcgagaatgcctttgatcaaccagaactatactgcaaaggttgggggaaaaagatgagctattgttttgcttttgatgacactctcatagaagatgtgagtgccaagtacattactcaaggtagactgcctaaaatgctagacgacgaaaccatcagaatatgcttgtattttttcaaccaggaagctcttaagatggtgagtgaaaatccagaggcattctactccgctttggttaagtatcacagatgtctgtctgcgaatagaaaagagagcgggtcaaaatcacgagccgtgaatgctagtttgacttcattgttaccacgacaatctggtagcgcatcctggacgtctgagagaggcgaaaacggactttagPAS_chr3_0200 55atgcctataaaggggcggttcaccaaaaagaagccaaaaaggaaagatgagccaaatcgaccgtcccccacccagttcatcaaaaaaatagcctcattgaaaaagcagaccaggagagatgaggccctggatgtgctacacgaactagcagttgttgtgtcacctttgatgaaagagaacggtttcactgttggattattatgcgaaatgttcccgaagaatgcctctttattggggctgaatgtgaatatgggttcaaagatcatgatccgattgagacctagccacaacatgaacttgtttttgccaaaaagagagatcatcggtacaatgctccatgagttaacccataatcgcttttcggcccatgatgtaaggttttatgactttcttgagggtctcaagagcaggttttttgagattcaggtgaaaggatctttacaaactacagggtatgttaactttagtgaagttctatctggtaatgcggcgagagggcaactgattcaaaaggaaaaagagaaaggacaaagattgggtggtaataagcatgcaaaacctatgagagtcctaatcttggaggcggccgagaagagaatgatagactctaaatggtgcggaggagctagcaatgaagtaggccttccaaaaattgaagatctaatggacgatgaagaagctcaacactctgaactaaaggaagagaatacaaagaaggtcagaaaaattgttcaacctagcaaaaagaaaattgtagatttggaaaacctaccgaatggcaagtccattattattgatctaactaatgacgatgactaaPAS_chr1- 56atggaacacaattgtctgaaagtcaatgaattggcgctccagttggctcaatcactgcagaacagcaaagtcagcacagctgatcctctaaa3_0105gaagaggacaagcagctacagaggcctgagtagcgagcctataatcacagaggaagaaccaacaatcaagggcgactataatagattttacagtcagtcttcagataagcaagtattggacaataaaccatggttgcaggatggaaactatttcaagactgtatacatttcaacgatagcactactgaagatgatgtctcatgcccggtccggtggttcaattgagattatgggcatgctgacaggtaaggtgtttgccaacacattagtcgtaatggattgctacttacttccggttgaaggtacagagacacgagtgaatgctcaagcggaaggatatgagttcatggtctcttatttggataacttaaaggaaatcaagcataacgagaatatcataggatggtatcactctcatcctggttatgggtgctggttgagtggaattgatgttgccactcagaatttaaaccaaaagtttcaagatccctacctggcgatagtgattgatcctgaaagatcagtcagacaaggatttgttgagattggagcattcagaacgtttgctgagccagccgttggaagatcgtcgtcgtcagtttcctctgcaagtggtgcaggaattagtgatgttgcgttttcttccggtagaaacagtgcatctggaatgtcctcagttctgagtgcaagtaatattagcattgccgaagagctaagcaaacaatcgatcacccaaaatgtttttgacagaactactacaaagattcccaagggcaaaatgactgattttggagctcattcaggaaaatattactcgctagaggttaaggttttcagatctccactggaggagaaactactggatacgtttggttctaaaacctggattaaaggtttaacgaactactccaacgttgttaatgccgaggaaactcaagtggagttaatgcataaaataatggaagccacggagaacttacggaaggaatctccttctaaattgccatctttggtgatggggaacctgatttattcaggtgcctctcaaggaacaacagggaaccgcaagcgctcaatgtccaaatcttctatttattcgggtttacaagcttcatcgggtatacccagttctaggtatcctacgaagggaaaaaatatgagtggatctcaattcaatgatgacccgctagcaagatcactggataaaataccgccagatagtccagatcaacagtacgatggcgcattatccattcaacaaccgaaaagagcatataatacacatacttctagagcaggtgggttggccagcgttctgtcctctgggagtatggatcctcaaagttactccatggtaggacgaatgagtctaactaatcaatcgccggggacagctctgagaggcctaaatacacctcccaacaaacgaccgcagagaaaccctggtcatacaagctcaggtcaaggaggaacgcctggaggagtcagtcggtccaaagagaaaattaacaagccaataggtataagcatgattagcaaggatttcaaggttgtcatctcacaacaggtcaaccagatgctacgtcgtcacgtccagaatgacctttttggatccaatagtccctaaPAS_chr3_0635 57atggatcatgcccaacgattgctagaactaagtttttacaatcaaagtctgggcaaatcagtgatagcaaagaaatacagaatagaatcctctcgatatttgaatgaacaactggacaagtccttgacaagagataatgatctgattggattatgccgtatagcattagacaacaagttgaccatatcagataagattatatggatgagctctcaagttgaagacaacttctttccgccagtttttcaaggcttgaagacgtatattgatagcgacgagatttatcaagagaaacttttaagcgtaccagcggattttgaaccaatagttgaatggaagagttgcacagagttgcccaatgaatggtcaaacaatggtgtggacaatttatttcaggattctttagatgactgttcgtttgtagcttcatttctatcctgcaacaatattggtatccctctcatggataaagtcattccccacaaaaactcgttcaaatatgcggttagactgactttcaatggttgcgaaaggttggtgtttattgatagccgtttgcctttgcttaggaatacttccaagactttacgagtgtcaagtttttctaacaaagatctcttatggcctagcatcatcgaaaaagctttcctgaaaatgtgtgatgatgggtacaagttttcaggatcaaattcagccattgcaaactatgctttgactggctggatccctgaagtcattaaaacttcttcatgtacaatagcagatattagccgattgcatgaggattttcggaacggaaacgtagtactatgcttgggaacgggcaatctgaccgagcgagaatgcaaacagtatggattgatccccaatcatgactatgctgtcactaaactatcatttacgaatgattcagaatacaagtttgacattcgtaatccgtggactaaagggcagaaagcagtgacaattacagatctttcaacctttgaagttatctacgcaaacagaaatcctataatgttttcgcacatgaaccagctaagcggtatctgtcaaagtcaggttaatgaagagttcatagatctaattcttaaccattcgcagtataccctaggcaatgacggtaattctacaattgatgtgattcttttctttgaaagacattcgttaagaaagaaaatcagtgcagagtctcgtattgagattttccaatcagaaggcgaaagactaatctccagaagaaataaagcaagcaaggaatgtgtttctaataataccaactttcatttcataacaatcgaactgaaaccgttagaaaaggtaactgtggtaatagatatcggcgagtcttcgattcgaagccatccatttactctaaaggcttttgccaatgattcaactataactttgaacaaagcactttctagacctggttgtttcaagcaaatggacctagagctaacgcccttaaactctggtgggaattgggataattatgcttattacaaaaatccacaactcatagtcactcttcacggagattcaacggatgaagctccatttgaatctgctgttttcagcaaaagtgataagaccctatttacgtatacagtgttttggaaaagtgacgatccagactttcctttcatcactgacgcaagcaagaacaagctcgtaagcacagacaataagtataaatacagatcatgtacaagatcaagagttgtttcttgcgacaaaagctatttgttcgtgctgagctcctacgaacctgatgcaattgagtctttcaaagtattttttcaatgttcccacgatttttctatagagtgggctgagacgtcgcttgggcttttcacaaaggaagaaactttctcctggaaggaccaattagtcaaggagttcattattcaagtctataacccttcaaagttgaaagttcacgcagtaaacaccaacaacaaacgcagatcaaaactaaattgctctctctcattccaaaacacattaatcagctctttgcaagactacacagacaatctctatggatgctttattagcgggaacttggagattcccggcaagtatctattacaagttcataaaaacattatatctaacgaagaatgtttggtcgaaattggatctagttcgtcatttgagttatgggaacatcattaaPAS_chr4_0503 58atgttgaaaactcgatttcattccagaaagggttttgtaatctacagtggagatgatgaagagagtgacgaagagagtaaacaatggatgtttcccgagtcgacctttgtaaccaatgggtttgaccaattgttcaaggtgagaaatgtcaataccattaatgacgacgatgacggctaccaatcgttcgatcaaccggattgggcgcaagatttaaccgcagatactcagtatcttgctttaggtgacgaaggggagaatcatcgttcacaacaagagataggcaacaggaaaagagccaacaaaaagcaaaagaagccaactaaagcaaagacaaaacgtcaacaaagacgcacagccaaaaatgatcaatccacggaacgatctgccatttcacaaccttctaacttaagtacactgaactccttactcaaatctgttcggtctgaactttccaattctgatgggagtccccacacattctacgatgtatctctctatgaagaagatctgaacaacctagctgatgacgaatggttgaacgataataacgtctcgtttatctacgagtacattgaaagattttacattacccgttgtttgagcgacaagcttcaattttcatcaaagaagatggtcaattctcaaataatactcctccgaccttctatggtttttttgctggcacattcaactccaaaagatatccaggattttctcccaccgttggataagtctggctttatattccttcctctgaacgacaatgatgatctggaaatggctgaaggtggatcccattggtgtcttttagttgtagctgttcacgataacaaatgtttcctctatgactcattagagaatgccaatctcacagagtctgttgcgcttgtgtctaagctgtccactctgctaaacaggcgaatacaactcgttgaaaatacacattgtcctcaacaactcaatggcagtgattgtggagtaatcacaacccaaattacagcactactggtatcccgactgctttgtgttttgccgggacatcctataaatttggatcttcaaaatgtagctatcaacgcaataagcgggagaatcttcatgttaaaactcctccaacatgttctgaacaattaa PAS_chr2- 59atggcaccaccagtccctgtatatacgagagatgaagtcaagatgcaatttccacagtacatgatgaaatttttgccttcaaactgtgagct1_0569gtactccatcatccagaaccaatgtaccttctctgctgacgagataatatgtgtgcccttcaagagggtgtttgccaaatgccggaggggaaaccaagaagccaagaggaacataataccagagaatggaggactgaatttaactggaaagaaactaatcccaagagaatacacagtcattgaagttacggactccctaacgaacaagtacgacaatagtagcctcatggacagattttttgaggcagaaagagatttaatgataaggtttcaagaatatgaggaacggaacagtaaggaaggagaaataaagtag PAS_chr3_1223 60atgctcagacagtttgctggaagggagttcaagcgtcggttttctacgggaatcaagacgatgccaacaaagcttaccaaactgccaaatggtattcgtgtcgtaacggacgaagctccgggccattttagtgccatgggcattttcgttgatgctggttcaagatatgagagccagtttccagaattaaccggccactctcacatcatcgatagacttgcattcaaatcaacatccaaattcgatgggaaatctatggtagaaaacaccaatcatttaggtggcaactttatgtgtgcctcttcaagagagtcattgatataccaggcttcagtgttcaacaaagatgtggacaagatggctgaaatcctcagttctacagtcaaagaacctttatttactgaggaggaagtttctaatcagatagcaacagcagattatgagttggatgagttatggctgcaacctgacctaattcttcccgaattgtctcaacaggtagcttatggatcaaaaaatttgggttccccgctgctctgtccgaaggagtctttagcaaacatctcaagagaatcccttttgaagtatcgtgaaatattttttagacctgagaacttggtcgttgctatgttgggagttccccacgagaaggccttggaacttgttgataaaaatttaggcgatatgaaatctgtcggttccagtccagtggtcaaagaacctgctaaatatacaggaggagaactttctttgcctccagttcctcctatgggtgggcttcccgagtttcatcacatatatcttacatttgaaggtgtccccgtggactctgacgatgtctactcactggctactttgcagatgctcgtcggtggtggtggatctttctctgctggtggtccaggaaaaggaatgtatgccagagcatacacgcgagttctgaatcagtacggttttattgaaagttgcaattcatatatacacaatttctcagactcggggctgtttggtctctcaatttcaagcattccgcaggcaaataaagttgttgcagaactcttaggtcatgaactgagctgcttgttttctgaaaatccgggcaaaggtgctcttaccaatgccgaagtaaaccgtgccaaaaatcagctacggtcttctttgttgatgaacttggagagcaagatggttcaattagaagaactaggaagacacattcaagtttatggcagaaaagttgatgtcacagagatgtgtgataaaatcagcaaagttacaaaggaagatctagttgcaattgcaaagaaagtcttgaccggaagcaacccgactatagttgttcaaggtgacagagaatcttatggagacattgagggtactttggcatcttttggagttggtttagatgccgcttccaaagcttcaaagaaaaaaacgagaggttggttctaaPAS_chr2- 61atggcaattatcaagttcaacgcaggcaaagtcaagattgacgaggaaaccaagctttgtacacccttggcaacaagaggagaaataatcgt1_0597ccaattgtcggctgagggcgaagagttttatgatttcaaatgggtccctactgagaacacagctggtgaaggtaaccagtcagagacattcttggtcattccgggcgatgtgacgtggaaacacgtcaaaagttgtaaagatggtagagttttcaaattgacatttttgagtagtggggcaaagagtttgttctggatgcaagatgataatggaaacgaggatgacccatcagagttgacaaccaaagataaggaaattagtgaaaaaattaccaagttgttcgacgaagaagagtga PAS_chr1- 62atgaaacacttggctgtccataagtacaaggtaggagccatcgcagctggcttggttgtctcctataaaatctttgcctaccgcgctgcgtc1_0327ttcctcctcctcaaacgtcatcaacttgaccaatatggcaaaaactccaatcactttaaaaccccctcaggctccactccgctgggaccatactccagagcagatccttgccgaaactgataagtatatatctaccagtcaagaggttgacgattgggtggcaaacagctttgccactgccaatgtggacaccatcaagaaaatagccgccgctgagaatgaacaatacttgccactgtgtcaattgagtttttatcaacatatctcggataaccaggacgttcgtaatgccagtactgttagtgaggagaaaattgataagttctccatcgaatccaaccttagagaagatgtgttcaaaacagtgaacaaagtgttcaaacaggttcaagaagattcggaactccaaaagaccttggacccagaatttaggcgtttactagaaaaattgaacctaggttacgtgagatctggtttagatttatcccaggagaagagagaccaagtcaagagtttgaaacaagaactatcaaccatttcaatcaagtttaataagaacttgggagaggaaactgaacacatttggttcaccactgaggagttaaaaggtgttccagaatcagttgttgagcagtttgaaactaagaatgagaatgatgttacttaccacaagatgacatacaagtatcctgacctgttcccggtactaaaatatgccgttaatccagctacgagacaaagagcttttgtcggggatcaaaacaagatacctgaaaattcaggattacttgtgaaagccgtcaatttgagaaacgaacttgcaaaagttttgggttatgatacctatgctgactatatcctggaagtgaagatggccaagaactccaagaatgtttttgaatttcttgatgatgtaagggaaaaactcagacctctcggagagaaggaactgcaaagaatgttgactctcaaggctaacgacccaaatgctgttgataaggaaaattactacgtctgggatcatcgttactatgataacaagcttcttgaatctgaatacaaagtggatgagcaaaagctggctgaatactttccaatggagtccaccattgaaaaaatgcttgccatttacgagcacttgttcaatttgcagtttcaacaagttgacgattcggagaaacaagtttggcatccagatgtaaaacaattctccgtttggaaaatcgataaccctgattctcctgaatttgtgggctggatctattttgatttgcatccaagagaaggaaaatacggtcacgctgctaattttggaatcggtcctagttacatcaaagaagatgggagtaaaaattatcctgtcactgctttggtttgcaacttttctaaaccatcaaaggataagccatccctattgaagcacaatgaagtcactacattcttccatgagctaggacatggtatccatgatttaattgggcaaactaggtatgctcgtttccatggtacttcagttgctcgtgatttcgttgaatgtccttcacagattctagagtactggacctggactagagatcaactcaagtctctttcccaacattacaagacaggagaagccctctccgatgaactcattgattcgctagtcaagtccaagcatgtcaatggcgccattttcaatcttaggcagttacactttggtctctttgacatgaaactacatactgccaaagagcctgaatctttagatgtgacaaggttgtggaacgaattacgtgaggaagtcgctctggttaagaatggtgaccaaattacgaaaggatacggttcatttggacacctaatgggcggttatgctgctggttactacggatacctgtattctcaagtgtttgccagtgacatttattacacctttttcaaagctgatccaatgagtacagctcaaggtatcaagtaccgtgatatcattcttgccagaggtggatcaagagaggagctagataatctcaaggaattacttggaagagagcctacatctgatgcctttatgactgagcttggagtagaaaatggtgcgtccaagttgtaaPAS_chr2- 63atgcgttttttggtctcatcctttcggcccttcagacatacaatttcgtcgcatatctcaatgggccaggctctgtctgccattcgtgtatt2_0380tcataaaaattctcactcacgtacccaaggtttaaggcgccactctcactactgttgccaccgcaagatagatatgagtacttctactaaacttccagagcgtcaattgctaccagccaatgttaggcctaccaaatatgatttgacattggagcccttattttctaccttcaagtttaacggagaagagactatacatttagatgttcaggaggactccagttctattacgctacacgctctagacatcgatctccaagattcactattgataacttcaaacaagtctaagactcccccgcttcatgtgacaagcaatgatgatgaccaatcgctcacttttcaattcaaagagggtactctagtaaagggagataaggtgcagctgcagttgaaatttgttggtgaattgaatgataagatggccggtttttaccgctcttcatatgaagagaatggagaaactaaatatttggcaactacccagatggagccaacagattgtcgtcgtgctttcccttcctttgatgagccatcgctaaaagccgtattcaacattgccctcattgctgatcagaaacttacttgtctctcaaacatggacgtgaaagaggaacaatctctcggagatagaaggaagaaggtgatattcaatcccactccactaatttctacttacctaattgcttttattgttggtgatttaaaatatattgaagccgactataactatcgcattcctgtcagagtttatgccacccctggtttagagaagcagggtcgtttttctgtcgagcttgctgctaaaacattagaattctttgagcaacagtttgatattgattatcctcttccaaagatggacatggtggcgattcatgatttcagtgcaggagctatggaaaactttgggcttgttacctatagagttgttgatttgctgtacgatgaaaaaaattcaaatttggctactaagcaacgtgttgcagaagttgtccaacacgaattggcgcatcagtggtttggtaatcttgtcacaatggagtggtgggagggcctttggctgaatgaaggctttgctacatggatgtcttggtactcttgtgacaagtttttccctgattggaaagtatgggaacaatatgttacagattctttacaacaggctctggctctggacgctctacgtgcttctcaccctattgaagttcctgtgaaaagggccgacgagatcaatcaaatttttgacgcaatttcctattctaaaggatcctccttgctaaaaatgatctccaaatggctcggagaggatgtgttcattaagggagtctccagttatttaaaaaagcacaggtatggtaatacgaaaaccaccgatttgtgggaatcgctttctgaggtgtctggaaaagatgtggtcaaagttatgagtatctggactggtaaaattggatttccaatcatctcagtaactgaaaatgcaaaccgtatcacttttactcagaacagatatttaactactggtgatgtaactcctgaagaggatacgacgatttatcctgtttttttgggactcaaaacagaaagctcaactgatgagtcgctggtccttgactcaaggtcaatgtcagtagatatccagaattctgactttttcaaagttaatgctgaacaagccggtatttacaggaccaattatgcaccagagagatggatcaaacttggaaagcaacctcaccttctaagtgtagaagaccgtgctggtttggttgcggatgcgggcgctctggctagttctggtcactcatctacaaggaactttttgaaccttgtaaattcatggaaagatgagtctagctttgttgtctgggacgaaataacttcccgtgttgcagctttaaaagcagcttggttatttgaatcccaatctgacattgacgccctgaatgctttcgtaagagaccttatttctacgaagatcaaaagtatcggatggtcattcaatgataatgaaccattccttgaacaaagactaaagagccttctatatgctactgctgctggtgcaaaagtaccaggagtagttaaatcagcattgataaactttcaaaaatacgttgctggtgataagactgccattcaccctaacataaaggcagttacgtttcaaactgttgcggcccaaggatctgaaaaggaatgggatcagttactcgacatctacaagaaccctgtatctattgatgagaaaattattgctcttaggtctctcggaaggtttgaagatcccatcttgatcgcaaagaccctggcactgttatttgatggttccgtaaggtcacaagatatttacgtaccaatgcaaggccttcgtgcgactaagataggagtagagtcacttttcaagtggttgactcttaattgggacaagatttataaattgcttccacctggtctgtcaatgcttggttctgtggttactatcagtacttctgggttcacttccttggatgatcaaaagcgtgtcaaagatttctttgcatcaaaggataccaaaggcttcgaccagggtttggcccaggcgttagacaccatccaatccaaggcaagttgggtacaacgtgactctaggaatgtatccgattggctacgtgagcagggatacaaaaaatag PAS_chr3_0928 64atgataaggatatccttgctgaaaagagcactgtttccctacgggcgactaccaatgcataatggtaggtggtattcagacataggtggcggaaattcaaggaatcggaacgaacagaaaccaaaattgcctgtaccaactagtaatgaagttaaggacaatgagtcaaacccggacttctttattaaaaacggctttagatcagctgatattgcagagacatcctttgtgaaagacaagggtgctacagtcgaagaggaacgtaatacatcggacagttcacacgaatctcctcaacttaattttaaggaaaccaacgacgaaacgaattcaacgatccaaccaccagtggcaaaattacccaccccaaagcaattgaaacaatacctggataggttcatcgtgggacaagagaagtgcaagaagataatgtcggtcgcagtttacactcattatgttcgaataaataaccaggctcagaaacggaatcagaaggtcgattcctctgaagaaaatgttgagaatgggtttccaaatgttactaaagaatttgaggacgaaaatgacccagattatgttccggatttggagaaatcaaatgttcttttgctgggaccgtctggatcaggcaagaccctgattgctaagactctcgctaaatgtctgcaggttccatttataattcaagattgtacctccttgacccaggctggttatgttggcgaggatattgagagctgtattgaaaagttgctaattgattcagactacgatattgaaaggtgtgaaaagggaattattgtgctggatgaaatagacaagttggccaagccctctgtctatacaggaaccaaagatattgcaggagagggtgttcaacaaggccttttaaaactggttgaaggtactacagttacggttcaatgcaagaggagcaatgctcctgatcataatcagttcggattgaatggcaaagctacaaatcaggacaaggaaaattatatcgttgacactacaaatatcttatttttaaccctgggagcgtttgtgaacctagataagattgttgcttataggctgaagcagaactctattggattcgatactgatgagtcgaaagatatttctgaaacagactcagtttccgacaaatctacattagaatatgttacacttccagatggatcaaaagtttcagctctggaacttgtgtcttctacggatctacagaattatgggttgattccagaactgatcggcaggcttccgattgtatcttcactttctcctttaacagttgatgatcttgtggctgtcctgactgagcccaggaactcgatactaaagcaatatgtgcatttctttgacactgtcaatgtcaaacttgctatcacttccaaggcaatcagaaggatagccgagatctcgatcaagaatggtacaggtgcaagaggtctcagagccattttggagaaactgctactcaatgccaagtatgattgccctggtagtagtatttcatttgtgttagttgatacagatgttataagtaagtctatcgatgagaataaggaaacgggggaattcgtcttcaaagatggtgagccaaagtattactcgcgtggagaattattttcctttttcaatgagttatcaaaagaagacgaaaaactcaagacatcaattgaaaagatgtgccaaataccactttccaagaatcgcatagtttactccgaagaggagcaggcaaggttggattcttctaaacctctcgccgtgaagcactatgaacctttcatttga PAS_chr1- 65atgagcttcaacctgctaagtgttcctttacgaacgtcaaagccgataccgttaggcgaaagcctaaaagagcttatcaacaatcagtacta3_0184ccagacatctgctgcgttcaaatcggatatcgaagagatcgaccaactaagaaatgatgtcctatcaatagaaccaaacaatgatggacttgcattgctcaagagatactatgtacagttagccagcattagccaaaaactccctgattattttatggagtatccctggtttggaacattaggataccaagtaactggccccgtagctctaaaatccctctatttcgaaagaatcaatatagcgtacaacatcgcagcgacgtattcaatcataggtttaaacgagcccagagctacaggagaaggcttgaaaaaatcatgcatttattttcagtatagtagtggggcattcgaaagtgtactgaagctagtggagcaaaaaccgaaagagctgacacttcccattgatcttagtgttaacattatgaaaaccctggctaaactcatgctggctcaggcccaggaatgtttttggcaaaaggctgtttctaacactttaaaagataacgttattgcaaggttggcctttcaagtatctcaattttacgatgaagctctgtctatggcttacaagtgcgatattttaaagtctgaatggatagaacatatgagttgcaagaagctgcattttaaggctgcggcccaatttagacttgcttgtgtggcagtcgctgcttctagacatggagaggaaatagcaagattaaggattgcaaataccatttgcgaaacagcatctagagaagccaagtatcaccttccctctgtatcttccgatttggagagtctttcgaagataatcaaagactctttaagaagaagtgaacgtgataatgatctaatatatctgcaggaagttcctaatgaatcagatcttcctccaattgttgcagcatctatggttgaacctaagccaatagttgagttaaattcagctgaatgtgcgaaagatacaaagaaatacggcaaaatccttttccatgatcttatgccatacttagtgattgaaattgcacaggcatttagagagaggcaggattcttatgttgtaaagcatatcaaggagcccatggagatgctgacaaagattcttcacacaatccttgctgaaaatggacttccggcgttgatagataccatacaaaggcctcaaagattgccaaccaacatccttgaacattgtcaaatactcaatgaaaggggtggcatggacaaacttaaggtatttttcgaagatatcagcaagctaagacacaaaagtgagcaagttctccaaaactgtgtcgaattgctacaaatggaagagtccgaaaatgaggaaatgagaaggaagcatggatcacagaggtggaattttgctgactctagggaggcatcagcagatgtcaggaaaagtgtacaggcactagagggctatttgaaacaggcccatgatggtgatcaagtgatctggaatgacttcgaacaattgaagccactactaagcatgatgagtgctcctaattcaactaaattactggaagaatttgtaccaaattcaaaattcgtcagacttcctccagaattgaaccgaatcgttaacgaattaagagctgatgttaatcaggtcaaaaagctcgcatcgcaaagggaaacttttattaatacagttaaagtaaaaagcaccgacctgtccatattgcccttggtagtttcccattataagaaattacaacaaaacaacattaatacgatcacgacggaattgttcgaagaagtgttcagacgacaggttagcaacttcgattctgatatcagatttgttcaaaaacacagggacaaccaaatcgagttagagaagcatattaaatctttggtccaacaattcaatcagcttagagggaatatagatgcctcgcaagaacgccaaaatgcacttcagttgttggacgatgcctataacggataccttgatttggtaaacaacctcacacagggacttagtttttacaatgatttcactggaaaggcaaatgatgtctatttgagatgtcaagaattctacaactttcgtaaacaagaagccatgaagctggagcaggaaatatatgctgtatttgaacaaggtaaatctcctcagaaaaaacaactagaagatcaggtttcagatcaaccaaaaagtgaagtcaagtcttcaaagggttattctaatgagctgtggaaccccgacgttggaattaaatttggctag PAS_chr1- 66atggtggcctctcttcacattgtcaatccgaatttggcctccgctttcagtttgcctcccaggtcaaacactttgagcgtttccatacacgc4_0286ttcggctttgttacagatcctggaatcaagttacttcgaccagaataagaatggtcgtatcataggaaccctcctaggttctaggtctgaagagacaacggaggttcaagtcaaagactctttcatagtttcccacacggaggacggagacgagtttaccattgattcttctcaacgtgaatttgtcgccatccacaagaagtctagcccaagagactcagtcgtaggatggttttccattaactctaaggtcgacagctttatcggactggtccatgactttttctcaaagggtccagatagcacacacccgtaccctgccatatatttgagtatccagttatgtgacgagagcggatccttcgtagagccagttttcaaggcgtacgttgcctccccagtgggatgttatggagctctggcaagtcacttagaccttgaaaaagctggctcttttgtcttctctgaagtcccaaccaaggtcatatactctgctaacgaaaaaagtctgctggctcatttcaagaacaacgttgtggaacccaaagttccaataccacaaaacgacacaaatcaactaatttcacaactcaacaaactcgacgtttccattgaccagttaatagactacgttgacaaagtcatttcaggatctctggatagaaatgatgtgaagaatgatgagattggccgtttcctgttgaccaacttagtttcccttccaacttctccttcaaaggaagagctttcatcttccataagctctcatatccaggactcactgatgatcgactacttggcctccgccgtgaaaactcaattagatgttagctccaaattaatgaacctggtacaagatgataaatag

TABLE 6 Polypeptide sequences of targeted proteases Protease GeneSymbol/Locus SEQ ID tag NO: Polypeptide sequence PAS_chr4_0584 67 1MLKDQFLLWV ALIASVPVSG VMAAPSESGH NTVEKRDAKN VVGVQQLDFS VLRGDSFESA 61SSENVPRLVR RDDTLEAELI NQQSFYLSRL KVGSHQADIG ILVDTGSSDL WVMDSVNPYC 121SSRSRVKRDI HDEKIAEWDP INLKKNETSQ NKNFWDWLVG TSTSSPSTAT ATGSGSGSGS 181GSGSGSAATA VSVSSAQATL DCSTYGTFDH ADSSTFHDNN TDFFISYADT TFASGIWGYD 241DVIIDGIEVK ELSFAVADMT NSSIGVLGIG LKGLESTYAS ASSVSEMYQY DNLPAKMVTD 301GLINKNAYSL YLNSKDASSG SILFGGVDHE KYSGQLLTVP VINTLASSGY REAIRLQITL 361NGIDVKKGSD QGTLLQGRFA ALLDSGATLT YAPSSVLNSI GRNLGGSYDS SRQAYTIRCV 421SASDTTSLVF NFGGATVEVS LYDLQIATYY TGGSATQCLI GIFSSGSDEF VLGDTFLRSA 481YVVYDLDGLE VSLAQANFNE TDSDVEAITS SVPSATRASG YSSTWSGSAS GTVYTSVQME 541SGAASSSNSS GSNMGSSSSS SSSSSSTSSG DEEGGSSANR VPFSYLSLCL VVILGVCIVPAS_chr3_1157 68 1MIINHLVLTA LSIALANDYE SLDLRHIGVL YTAEIQIGSD ETEIEVIVDT GSADLWVIDS 61DAAVCELSYD EIEANSFSSA SAKFMDKIAP PSQELLDGLS EFGFALDGEI SQYLADKSGR 121VSKREENQQD FNINRDEPVC EQFGSFDSSS SDTFQSNNSA FGIAYLDGTT ANGTWVRDTV 181RIGDFAISQQ SFALVNITDN YMGILGLGPA TQQTTNSNPI AANRFTYDGV VDSLRSQGFI 241NSASFSVYLS PDEDNEHDEF SDGEILFGAI DRAKIDGPFR LFPYVNPYKP VYPDQYTSYV 301TVSTIAVSSS DETLIIERRP RLALIDTGAT FSYLPTYPLI RLAFSIHGGF EYVSQLGLFV 361IRTSSLSVAR NKVIEFKFGE DVVIQSPVSD HLLDVSGLFT DGQQYSALTV RESLDGLSIL 421GDTFIKSAYL FFDNENSQLG IGQINVTDDE DIEVVGDFTI ERDPAYSSTW SSDLPHETPT 481RALSTASGGG LGTGINTATS RASSRSTSGS TSRTSSTSGS ASGTSSGASS ATQNDETSTD 541LGAPAASLSA TPCLFAILLL ML PAS_chr1- 69 1MVASHVNNAS ASRSNTSVSH ASASSYDNKN GRGTGSRSTT VVKDSVSHTD GDTDSSRVAH 4_028961 KKSSRDSVVG WSNSKVDSGV HDSKGDSTHY AYSCDSGSVV KAYVASVGCY GAASHDKAGS 121VSVTKVYSAN KSAHKNNVVK VNDTNSNKDV SDDYVDKVSG SDRNDVKNDG RTNVSTSSKS 181SSSSHDSMDY ASAVKTDVSS KMNVDDK

TABLE 7 Forward (F) and Reverse (R) Primers for 5′ and 3′ homology arms(HA) targeting protease ORF SEQ ID Description NO: 5′ to 3′ SequencePAS chr1-1 0174 5′ HA F 70 ACCTATTGTTTACCTTCCTG PAS chr1-1 0174 5′ HA R71 GAATTCTCTCACTTAATCTTTAGCTCCCATGCTCATCTTG PAS chr1-1 0174 3′ HA F 72GCGGCCGCaagaagttgattGTTTATTTGTAGGCGGTGCC PAS chr1-1 0174 3′ HA R 73GGGCTATCCGCCTTATCTTG PAS chr1-1 0226 5′ HA F 74 AATAACTTCATGACTGCATTPAS chr1-1 0226 5′ HA R 75 GAATTCTCTCACTTAATCTTAGTTTAAATAATATGGAGATPAS chr1-1 0226 3′ HA F 76 GCGGCCGCaagaagttgattATTGGAGAAAAGGAATACACPAS chr1-1 0226 3′ HA R 77 GGCATCTCCGTCTGGTGCAG KO_PAS_chr3_1087 5′ HA F78 CAAGGTTCGAAACTGCAGCT KO_PAS_chr3_1087 5′ HA R 79CTCACTTAATCTTCTGTACTCTGAAGAGAGAGCAAACCAATGGCAA KO_PAS_chr3_1087 3′ HA F80 AGAAGTTGATTGAGACTTTCAACGAGGGTCCTTTGGCAATCATTGGT KO_PAS_chr3_1087 3′HA R 81 ACCCCAGGACCAGGTATTTC KO_PAS_chr4_0584 5′ HA F 82TACTACAGGCTGGCTGTTCC KO_PAS_chr4_0584 5′ HA R 83CTCACTTAATCTTCTGTACTCTGAAGAAGTCCAACTGTTGAACGCC KO_PAS_chr4_0584 3′ HA F84 AGAAGTTGATTGAGACTTTCAACGAGGGTCCCCTTCAGCTACCTTT KO_PAS_chr4_0584 3′HA R 85 TCCCTGCTAAGCCCTAATCG KO_PAS_chr3_0076 5′ HA F 86AAGTTGTATGGCCGTCCTCA KO_PAS_chr3_0076 5′ HA R 87CTCACTTAATCTTCTGTACTCTGAAGTGAGTCTTGGTTGTGTCGGT KO_PAS_chr3_0076 3′ HA F88 AGAAGTTGATTGAGACTTTCAACGAGGCCTCCTGTTTGATCGGTTC KO_PAS_chr3_0076 3′HA R 89 GTGCCATGGTGACGTTACAG KO_PAS_chr3_0691 5′ HA F 90CGGAGTTATAGGGGACGCTT KO_PAS_chr3_0691 5′ HA R 91CTCACTTAATCTTCTGTACTCTGAAGCGTCACATCATAGCCGTTCTC KO_PAS_chr3_0691 3′ HA F92 AGAAGTTGATTGAGACTTTCAACGAGCGTCAAAAGTGGTCGTGGAC KO_PAS_chr3_0691 3′HA R 93 TGGCCCAGTTACACGGAATA KO_PAS_chr3_0303 5′ HA F 94GTCGATCGTTGGTGTGTGAC KO_PAS_chr3_0303 5′ HA R 95CTCACTTAATCTTCTGTACTCTGAAGGAGCCGACTTTGACATCGAC KO_PAS_chr3_0303 3′ HA F96 AGAAGTTGATTGAGACTTTCAACGAGAGCGAAGAGACTGGTTCCAA KO_PAS_chr3_0303 3′HA R 97 AGCTGTTCTAACCGTCCTCA KO_PAS_chr3_0815 5′ HA F 98CTTGGAATATCTGTGGGCGC KO_PAS_chr3_0815 5′ HA R 99CTCACTTAATCTTCTGTACTCTGAAGTCATGACCAGCAGTTGTTCA KO_PAS_chr3_0815 3′ HA F100 AGAAGTTGATTGAGACTTTCAACGAGATGCTGCAGGAAGGAACACT KO_PAS_chr3_0815 3′HA R 101 CAAACTCTGCACCTCCAAGC KO_PAS_chr3_1157 5′ HA F 102CTCTGATTGCACGAGAAGGC KO_PAS_chr3_1157 5′ HA R 103CTCACTTAATCTTCTGTACTCTGAAGTGAAAGGCGATTGGAGTTGC KO_PAS_chr3_1157 3′ HA F104 AGAAGTTGATTGAGACTTTCAACGAGCTGGCTCTGCTTCTGGTACT KO_PAS_chr3_1157 3′HA R 105 GATGTTGAGGCGGGCATAAG KO_PAS_chr1-4_0164 5′ HA F 106TTTCAACGGGGTTCTACGGA KO_PAS_chr1-4_0164 5′ HA R 107CTCACTTAATCTTCTGTACTCTGAAGGTGGTAGTATGTGTGTTGGTGT KO_PAS_chr1-4_0164 3′HA F 108 AGAAGTTGATTGAGACTTTCAACGAGCTGCGCTTTCAAGTACTGCAKO_PAS_chr1-4_0164 3′ HA R 109 TGTCTTCCTCGTCTTCCTCG KO_PAS_chr3_0979 5′HA F 110 CGGGCAATAATCAGTGGAGC KO_PAS_chr3_0979 5′ HA R 111CTCACTTAATCTTCTGTACTCTGAAGCGTTGGAGGTAATGCATGGG KO_PAS_chr3_0979 3′ HA F112 AGAAGTTGATTGAGACTTTCAACGAGGGCGGACCGTGTATTAGAGA KO_PAS_chr3_0979 3′HA R 113 TCAGAGAAGCCAGTGGAAGG KO_PAS_chr3_0803 5′ HA F 114TTCCTCGGCCTCTTTATGCT KO_PAS_chr3_0803 5′ HA R 115CTCACTTAATCTTCTGTACTCTGAAGCAACGTGGCTAACTCCTTGG KO_PAS_chr3_0803 3′ HA F116 AGAAGTTGATTGAGACTTTCAACGAGGTTGTCGACGGCATTGAAGA KO_PAS_chr3_0803 3′HA R 117 TCGGTTCAAAGCCCCTAAGT KO_PAS_chr3_0394 5′ HA F 118AGGTGTGAAATGCGCTGATC KO_PAS_chr3_0394 5′ HA R 119CTCACTTAATCTTCTGTACTCTGAAGAAACCAACAACGCCTGGTAC KO_PAS_chr3_0394 3′ HA F120 AGAAGTTGATTGAGACTTTCAACGAGTCACAGGCTGAAGGATCGAA KO_PAS_chr3_0394 3′HA R 121 CCATGGTGTGTTTTCCGGTT KO_PAS_chr2-1_0366 5′ HA F 122TGAGGGACAAAGTAATGGGGT KO_PAS_chr2-1_0366 5′ HA R 123CTCACTTAATCTTCTGTACTCTGAAGACCGAAGTCATGGTTGGAAA KO_PAS_chr2-1_0366 3′HA F 124 AGAAGTTGATTGAGACTTTCAACGAGCTACCGCAGACAACCCATTCKO_PAS_chr2-1_0366 3′ HA R 125 CGCTCCCTCATCGAGTACTT KO_PAS_chr3_0842 5′HA F 126 CAGACATCGTGGAAACTGCC KO_PAS_chr3_0842 5′ HA R 127CTCACTTAATCTTCTGTACTCTGAAGTATCTGCTTCGATCCCTGCA KO_PAS_chr3_0842 3′ HA F128 AGAAGTTGATTGAGACTTTCAACGAGTTCTCCCGTCCAGTTAGCAG KO_PAS_chr3_0842 3′HA R 129 ATTTCAGAAGCTCCGCATCC KO_PAS_chr1-3_0195 5′ HA F 130ACAAAAGCACGCGATTGAGA KO_PAS_chr1-3_0195 5′ HA R 131CTCACTTAATCTTCTGTACTCTGAAGACACTCACGGTTGTTTGCAA KO_PAS_chr1-3_0195 3′HA F 132 AGAAGTTGATTGAGACTTTCAACGAGAACCCCAACAAGCGGCTATAKO_PAS_chr1-3_0195 3′ HA R 133 ACCCGGATCTGCTAGTGAAGKO_PAS_chr1-4_0052 5′ HA F 134 CGTATGCTCGTGTGACTGTGKO_PAS_chr1-4_0052 5′ HA R 135CTCACTTAATCTTCTGTACTCTGAAGTTCCTATGCCTGGCGATGAT KO_PAS_chr1-4_0052 3′HA F 136 AGAAGTTGATTGAGACTTTCAACGAGAGGGAGTCTTGTATAGTTGAGCAKO_PAS_chr1-4_0052 3′ HA R 137 AGCAGGGGTATTTTCACGGAKO_PAS_chr2-2_0057 5′ HA F 138 AGCATGATTGTGTTGGGTGGKO_PAS_chr2-2_0057 5′ HA R 139CTCACTTAATCTTCTGTACTCTGAAGAATCCGATACTGTAGCCCCG KO_PAS_chr2-2_0057 3′HA F 140 AGAAGTTGATTGAGACTTTCAACGAGGCAAAGAAAACTGGCCACACKO_PAS_chr2-2_0057 3′ HA R 141 GGAAGGCCCTATTCACGACTKO_PAS_chr1-3_0150 5′ HA F 142 CACCATTTCCCTGCTGTGTCKO_PAS_chr1-3_0150 5′ HA R 143CTCACTTAATCTTCTGTACTCTGAAGTCAATACCGAAGACTCCGCA KO_PAS_chr1-3_0150 3′HA F 144 AGAAGTTGATTGAGACTTTCAACGAGGGGAGGTATTCAGGAGGCATKO_PAS_chr1-3_0150 3′ HA R 145 GCTCGATCAGATATTGTCCGCKO_PAS_chr1-3_0221 5′ HA F 146 AGCAGCTCTCCAATCAGTGTKO_PAS_chr1-3_0221 5′ HA R 147CTCACTTAATCTTCTGTACTCTGAAGCTGGAATTGTGATCCCGCTG KO_PAS_chr1-3_0221 3′HA F 148 AGAAGTTGATTGAGACTTTCAACGAGTTTTGAAGCAAGCCTACCCCKO_PAS_chr1-3_0221 3′ HA R 149 CAGGATCCAGCCGCTAAAAC KO_PAS_FragD_0022 5′HA F 150 TGAACAAGCAGCCACATCAC KO_PAS_FragD_0022 5′ HA R 151CTCACTTAATCTTCTGTACTCTGAAGTGAGGGCCATTCTGACATACT KO_PAS_FragD_0022 3′HA F 152 AGAAGTTGATTGAGACTTTCAACGAGGTGAGGTATTTAACTGCACGAGKO_PAS_FragD_0022 3′ HA R 153 TCGCCTACATAGTCTGCACA KO_PAS_chr2-1_0159 5′HA F 154 ACCTCATGCCATGTCTGTCA KO_PAS_chr2-1_0159 5′ HA R 155CTCACTTAATCTTCTGTACTCTGAAGTTGACTGCCGCTTCAAAGTC KO_PAS_chr2-1_0159 3′HA F 156 AGAAGTTGATTGAGACTTTCAACGAGCCGCCAGAGAATTTGTGCTTKO_PAS_chr2-1_0159 3′ HA R 157 TAGAGGTGAACGTTTGGCCTKO_PAS_chr2-1_0326 5′ HA F 158 AATCCATCACCTCCACCCAGKO_PAS_chr2-1_0326 5′ HA R 159CTCACTTAATCTTCTGTACTCTGAAGGCTGCTGGAGTAAAAGGTCC KO_PAS_chr2-1_0326 3′HA F 160 AGAAGTTGATTGAGACTTTCAACGAGCAAGCAGCAACCATCTACGGKO_PAS_chr2-1_0326 3′ HA R 161 AACCTCATCCACTGTCAGCAKO_PAS_chr2-2_0056 5′ HA F 162 GGAAGACAAAGTTCGCTCCGKO_PAS_chr2-2_0056 5′ HA R 163CTCACTTAATCTTCTGTACTCTGAAGTCATAGTTGAGAGCCTCCTTGT KO_PAS_chr2-2_0056 3′HA F 164 AGAAGTTGATTGAGACTTTCAACGAGACAATGCACTAGGACGGGATKO_PAS_chr2-2_0056 3′ HA R 165 CTTGAATCAGGCGACGTACCKO_PAS_chr1-4_0611 5′ HA F 166 CCCAGCTCTCTTTCACTCCAKO_PAS_chr1-4_0611 5′ HA R 167CTCACTTAATCTTCTGTACTCTGAAGTTGAAGAGCAGCAGAGTCGA KO_PAS_chr1-4_0611 3′HA F 168 AGAAGTTGATTGAGACTTTCAACGAGTTAATTGCCCACAGTGTCGCKO_PAS_chr1-4_0611 3′ HA R 169 ACCTTCCACAGTCGACGAATKO_PAS_chr1-1_0274 5′ HA F 170 ACAAACAGTCAAATGCACGGAKO_PAS_chr1-1_0274 5′ HA R 171CTCACTTAATCTTCTGTACTCTGAAGTCCTTCCACCTTTCCAACGT KO_PAS_chr1-1_0274 3′HA F 172 AGAAGTTGATTGAGACTTTCAACGAGGGGGTAGAGAAGTTAGGGAGGKO_PAS_chr1-1_0274 3′ HA R 173 GGAACTACAACTGGAGGCCT KO_PAS_chr4_0834 5′HA F 174 TAGTGCCGGTTCCATGGATT KO_PAS_chr4_0834 5′ HA R 175CTCACTTAATCTTCTGTACTCTGAAGGGTCTATGGGTTGATGCGGA KO_PAS_chr4_0834 3′ HA F176 AGAAGTTGATTGAGACTTTCAACGAGATGTGTTGCTCGCTCTAGGT KO_PAS_chr4_0834 3′HA R 177 CGACAAACACACCAAGGTCC KO_PAS_chr3_0896 5′ HA F 178GTTGTTGGAGTGAGCGATGG KO_PAS_chr3_0896 5′ HA R 179CTCACTTAATCTTCTGTACTCTGAAGCCTCCGTTGATACTCCCGAT KO_PAS_chr3_0896 3′ HA F180 AGAAGTTGATTGAGACTTTCAACGAGTGCATTCAAGGCTGGCAAAT KO_PAS_chr3_0896 3′HA R 181 GCATATGGAGTGGTGTGCAG KO_PAS_chr3_0561 5′ HA F 182CGGGTAGCATTGAACGTACG KO_PAS_chr3_0561 5′ HA R 183CTCACTTAATCTTCTGTACTCTGAAGATGCTACGGTAAACACCCCA KO_PAS_chr3_0561 3′ HA F184 AGAAGTTGATTGAGACTTTCAACGAGACTGGAGAAAGCTTGGTCGA KO_PAS_chr3_0561 3′HA R 185 AGGCACCAGAAGAAAGAGCT KO_PAS_chr3_0633 5′ HA F 186GGACACGTTTGGAGCTTCTT KO_PAS_chr3_0633 5′ HA R 187CTCACTTAATCTTCTGTACTCTGAAGGCCCACCAATTCAGCAACTT KO_PAS_chr3_0633 3′ HA F188 AGAAGTTGATTGAGACTTTCAACGAGGATGCTGGTCACATGGTTCC KO_PAS_chr3_0633 3′HA R 189 AACCGCCAATAGTTTCAGCC KO_PAS_chr4_0013 5′ HA F 190GGATGAGAAAGCGGCTTCTG KO_PAS_chr4_0013 5′ HA R 191CTCACTTAATCTTCTGTACTCTGAAGGTGCCAAAAGTCTGATCCGG KO_PAS_chr4_0013 3′ HA F192 AGAAGTTGATTGAGACTTTCAACGAGTGCCACTTCGTTCTTTGACG KO_PAS_chr4_0013 3′HA R 193 ACGGATCAGTGATGGCGTAT KO_PAS_chr1-1_0379 5′ HA F 194ATGGGATCTGGACGACGTTT KO_PAS_chr1-1_0379 5′ HA R 195CTCACTTAATCTTCTGTACTCTGAAGAGCTGGATCACAAACATTCGG KO_PAS_chr1-1_0379 3′HA F 196 AGAAGTTGATTGAGACTTTCAACGAGCTTTGAGTGTTGGTCCCTGCKO_PAS_chr1-1_0379 3′ HA R 197 CGGCTACCAAGTCAGACCTTKO_PAS_chr2-1_0172 5′ HA F 198 GTTGCCCATTACGTCCTGTGKO_PAS_chr2-1_0172 5′ HA R 199CTCACTTAATCTTCTGTACTCTGAAGCCTTTGATCTTTGGTGCATCTTG KO_PAS_chr2-1_0172 3′HA F 200 AGAAGTTGATTGAGACTTTCAACGAGCACTACAGCTGGGAACGAGAKO_PAS_chr2-1_0172 3′ HA R 201 ACGGGTTGGAAAAGTTGAGC KO_PAS_chr3_0866 5′HA F 202 AGTGGGGTTGGAGATTGGAG KO_PAS_chr3_0866 5′ HA R 203CTCACTTAATCTTCTGTACTCTGAAGACGATTCCAGCATAGCCTGT KO_PAS_chr3_0866 3′ HA F204 AGAAGTTGATTGAGACTTTCAACGAGCTGGTAGCCGCAAAACTTCA KO_PAS_chr3_0866 3′HA R 205 GCGTTGAATCCTCCTCGTTC KO_PAS_chr3_0299 5′ HA F 206CTGTGGGGTCTGAACATCCT KO_PAS_chr3_0299 5′ HA R 207CTCACTTAATCTTCTGTACTCTGAAGAGCTGCTAGGGTTCATTGAGT KO_PAS_chr3_0299 3′ HA F208 AGAAGTTGATTGAGACTTTCAACGAGCTCCCTTGGGTACGTCAACT KO_PAS_chr3_0299 3′HA R 209 TGGCAGTCTTCACATGTCCT KO_PAS_chr1-4_0251 5′ HA F 210AGCTGGTCAAGTCTGGTACC KO_PAS_chr1-4_0251 5′ HA R 211CTCACTTAATCTTCTGTACTCTGAAGGAGGTCTAGTGTGTGAGGCT KO_PAS_chr1-4_0251 3′HA F 212 AGAAGTTGATTGAGACTTTCAACGAGAGAAGGTATAGGGAATATGCGGTKO_PAS_chr1-4_0251 3′ HA R 213 TAGCCACAACCCTGATGACG KO_PAS_chr4_0874 5′HA F 214 TACACTGGGACGCAGATGTT KO_PAS_chr4_0874 5′ HA R 215CTCACTTAATCTTCTGTACTCTGAAGTGCTCAAACTCTGTATCCGTTG KO_PAS_chr4_0874 3′HA F 216 AGAAGTTGATTGAGACTTTCAACGAGCTTTCAAGGCCGCAATGCTAKO_PAS_chr4_0874 3′ HA R 217 CTTCCTTTGCAGTTGGTGGT KO_PAS_chr3_0513 5′HA F 218 GGGTCTTTGGCTTTGGTGAG KO_PAS_chr3_0513 5′ HA R 219CTCACTTAATCTTCTGTACTCTGAAGCGTCTCTGGAACTCGTCGAT KO_PAS_chr3_0513 3′ HA F220 AGAAGTTGATTGAGACTTTCAACGAGCCCCAAGTCAAGGAGGAGTT KO_PAS_chr3_0513 3′HA R 221 GAGTCCAATCACGGCCAATC KO_PAS_chr1-1_0127 5′ HA F 222TGCTTCTTCGGACAGATCGT KO_PAS_chr1-1_0127 5′ HA R 223CTCACTTAATCTTCTGTACTCTGAAGTACTGATTGAAGGGTCGGCA KO_PAS_chr1-1_0127 3′HA F 224 AGAAGTTGATTGAGACTTTCAACGAGTTGTACGGACCAGGAAGCATKO_PAS_chr1-1_0127 3′ HA R 225 TTCCTCTGCCTCTTCCTTGG KO_PAS_chr4_0686 5′HA F 226 AGCATGCAAACACGAGGTAC KO_PAS_chr4_0686 5′ HA R 227CTCACTTAATCTTCTGTACTCTGAAGAGAGGAAAACGAGCTTGGGT KO_PAS_chr4_0686 3′ HA F228 AGAAGTTGATTGAGACTTTCAACGAGATCAAGGTTGCCAGCGAATG KO_PAS_chr4_0686 3′HA R 229 ACCCTACAGAACCGCAATGA KO_PAS_chr2-2_0159 5′ HA F 230ACAGCCCAAATAGAGACGCA KO_PAS_chr2-2_0159 5′ HA R 231CTCACTTAATCTTCTGTACTCTGAAGAGGAGCCCAGTTTTACGTCA KO_PAS_chr2-2_0159 3′HA F 232 AGAAGTTGATTGAGACTTTCAACGAGTATCCCGCGGTGAAGACTACKO_PAS_chr2-2_0159 3′ HA R 233 GTGTTGCTAAGCCTGTGGAC KO_PAS_chr3_0388 5′HA F 234 TCCTCCTTTCGACGCTTCTT KO_PAS_chr3_0388 5′ HA R 235CTCACTTAATCTTCTGTACTCTGAAGACAGCTGTGAATCATGAAGTTTT KO_PAS_chr3_0388 3′HA F 236 AGAAGTTGATTGAGACTTTCAACGAGATTCTCACTGGCAGAACGGAKO_PAS_chr3_0388 3′ HA R 237 TTTTCACGTTGAGGCCACTG KO_PAS_chr3_0419 5′HA F 238 AGCTCCGCAGTAACAGGAAT KO_PAS_chr3_0419 5′ HA R 239CTCACTTAATCTTCTGTACTCTGAAGTCAAAGCAACTTATGGCGGT KO_PAS_chr3_0419 3′ HA F240 AGAAGTTGATTGAGACTTTCAACGAGCTCTTCGCAGCACCAGAAAG KO_PAS_chr3_0419 3′HA R 241 TCGTTGTTGCTGGTGTTCTG KO_PAS_chr1-3_0258 5′ HA F 242AGTTTGAAGGCACGTTGGTC KO_PAS_chr1-3_0258 5′ HA R 243CTCACTTAATCTTCTGTACTCTGAAGACTCCAACAGGACTTTGAGGT KO_PAS_chr1-3_0258 3′HA F 244 AGAAGTTGATTGAGACTTTCAACGAGAAATGTGGAAGTTGCAGCGGKO_PAS_chr1-3_0258 3′ HA R 245 AGGTTGATCGCCGTCTTGTA KO_PAS_chr4_0913 5′HA F 246 TCTTCATGAGGTGGTAGGCG KO_PAS_chr4_0913 5′ HA R 247CTCACTTAATCTTCTGTACTCTGAAGAGAGGGCAGATGACATACCG KO_PAS_chr4_0913 3′ HA F248 AGAAGTTGATTGAGACTTTCAACGAGGAGAAACTGGAGGTGCTCGT KO_PAS_chr4_0913 3′HA R 249 CAAGGCATTCAGTTGACCGT KO_PAS_chr1-1_0066 5′ HA F 250ACCAACGAGCCTTACAGACA KO_PAS_chr1-1_0066 5′ HA R 251CTCACTTAATCTTCTGTACTCTGAAGTTTTGACCGTCAGTGCATGG KO_PAS_chr1-1_0066 3′HA F 252 AGAAGTTGATTGAGACTTTCAACGAGGTCGGAGGTGTGAGAATTGAKO_PAS_chr1-1_0066 3′ HA R 253 TGGGAACTATGTGGCTCCTCKO_PAS_chr2-2_0310 5′ HA F 254 CGAGCTATCAGTACTCCCGGKO_PAS_chr2-2_0310 5′ HA R 255CTCACTTAATCTTCTGTACTCTGAAGGGTTCTCAGCTGTCCGAGAT KO_PAS_chr2-2_0310 3′HA F 256 AGAAGTTGATTGAGACTTTCAACGAGTAGCATTGCCCATCACAACGKO_PAS_chr2-2_0310 3′ HA R 257 GTGGGAAGACTATTGATGCGAKO_PAS_chr1-3_0261 5′ HA F 258 GGGAAATCGCTGAGGTGTACKO_PAS_chr1-3_0261 5′ HA R 259CTCACTTAATCTTCTGTACTCTGAAGAGGTCATCTGGAAGCTTTGC KO_PAS_chr1-3_0261 3′HA F 260 AGAAGTTGATTGAGACTTTCAACGAGGGTGGCCAATGGTATTACTTTGAKO_PAS_chr1-3_0261 3′ HA R 261 ATAAGAGCCCCGATACAGGCKO_PAS_chr2-1_0546 5′ HA F 262 CTTGACACACTTTGCTCCTGAKO_PAS_chr2-1_0546 5′ HA R 263CTCACTTAATCTTCTGTACTCTGAAGAGTAGCTGACCTGTTGTGCC KO_PAS_chr2-1_0546 3′HA F 264 AGAAGTTGATTGAGACTTTCAACGAGGGACACCATATGATGCCCGAKO_PAS_chr2-1_0546 3′ HA R 265 CAGATCAAGTCCAAGTCCGCKO_PAS_chr2-2_0398 5′ HA F 266 AGAGACTTTGCGAGAGTCCCKO_PAS_chr2-2_0398 5′ HA R 267CTCACTTAATCTTCTGTACTCTGAAGTGCAATATCCAAACACGCCA KO_PAS_chr2-2_0398 3′HA F 268 AGAAGTTGATTGAGACTTTCAACGAGACTTCTGGAATCTTCGGGCAKO_PAS_chr2-2_0398 3′ HA R 269 GGATGTTTGGGCCATTGTGA KO_PAS_chr4_0835 5′HA F 270 CAATCTCTCGCTTCATCACG KO_PAS_chr4_0835 5′ HA R 271CTCACTTAATCTTCTGTACTCTGAAGTCGCTGTTAACCATAATTCTTTG KO_PAS_chr4_0835 3′HA F 272 AGAAGTTGATTGAGACTTTCAACGAGGCGAGGGTTGAGGAGATTTTKO_PAS_chr4_0835 3′ HA R 273 GGCCATGGCACTATTTTGTT KO_PAS_chr1-1_0491 5′HA F 274 ACGTACTTCCCGCCCAATAA KO_PAS_chr1-1_0491 5′ HA R 275CTCACTTAATCTTCTGTACTCTGAAGCCCACCTAAATTTCGAGTGCA KO_PAS_chr1-1_0491 3′HA F 276 AGAAGTTGATTGAGACTTTCAACGAGACACTTTCGCAGCTTTTGGTKO_PAS_chr1-1_0491 3′ HA R 277 TCCTCCTTGCCATGAAGAGGKO_PAS_chr2-1_0447 5′ HA F 278 GCCTGATGAAGATGATGCCGKO_PAS_chr2-1_0447 5′ HA R 279CTCACTTAATCTTCTGTACTCTGAAGAGGCTCAGTCACCTCTATGA KO_PAS_chr2-1_0447 3′HA F 280 AGAAGTTGATTGAGACTTTCAACGAGTGATCAAGAACACCGTCGAAGKO_PAS_chr2-1_0447 3′ HA R 281 TCCCTTTGTTGGTCGTACGAKO_PAS_chr1-3_0053 5′ HA F 282 TGGTTCAACTTGTAGCGCATKO_PAS_chr1-3_0053 5′ HA R 283CTCACTTAATCTTCTGTACTCTGAAGGGGCTTGCTCAACTTTTGGA KO_PAS_chr1-3_0053 3′HA F 284 AGAAGTTGATTGAGACTTTCAACGAGCGACAATCTGGTAGCGCATCKO_PAS_chr1-3_0053 3′ HA R 285 ATGCTCGTACAAAGACCCCA KO_PAS_chr3_0200 5′HA F 286 TGAGATCTCCAAGTGCAGCA KO_PAS_chr3_0200 5′ HA R 287CTCACTTAATCTTCTGTACTCTGAAGGACGGTCGATTTGGCTCATC KO_PAS_chr3_0200 3′ HA F288 AGAAGTTGATTGAGACTTTCAACGAGTGAAGAAGCTCAACACTCTGAACKO_PAS_chr3_0200 3′ HA R 289 TGATTGACGGCACCCTGTAT KO_PAS_chr1-3_0105 5′HA F 290 CAATAATTCAGCTGCGCCCT KO_PAS_chr1-3_0105 5′ HA R 291CTCACTTAATCTTCTGTACTCTGAAGCCTCTGTAGCTGCTTGTCCT KO_PAS_chr1-3_0105 3′HA F 292 AGAAGTTGATTGAGACTTTCAACGAGAGGAGTCAGTCGGTCCAAAGKO_PAS_chr1-3_0105 3′ HA R 293 TGTGGGCTGGGATGTGTAAT KO_PAS_chr3_0635 5′HA F 294 AGCACGGTCAAGTAAATCGC KO_PAS_chr3_0635 5′ HA R 295CTCACTTAATCTTCTGTACTCTGAAGTGCTATCACTGATTTGCCCA KO_PAS_chr3_0635 3′ HA F296 AGAAGTTGATTGAGACTTTCAACGAGGGAGATTCCCGGCAAGTATC KO_PAS_chr3_0635 3′HA R 297 GGCTTTCTGACTACCTGGGT KO_PAS_chr4_0503 5′ HA F 298AAAGGGAAGAAGGGTGCAGT KO_PAS_chr4_0503 5′ HA R 299CTCACTTAATCTTCTGTACTCTGAAGAAGGTCGACTCGGGAAACAT KO_PAS_chr4_0503 3′ HA F300 AGAAGTTGATTGAGACTTTCAACGAGTGGTATCCCGACTGCTTTGT KO_PAS_chr4_0503 3′HA R 301 TGGAATGGCTCGAGAATGGT KO_PAS_chr2-1_0569 5′ HA F 302ACCAACAGGCTGAACACTAGA KO_PAS_chr2-1_0569 5′ HA R 303CTCACTTAATCTTCTGTACTCTGAAGTCGTCAGCAGAGAAGGTACA KO_PAS_chr2-1_0569 3′HA F 304 AGAAGTTGATTGAGACTTTCAACGAGACGGACTCCCTAACGAACAAKO_PAS_chr2-1_0569 3′ HA R 305 TCTGATGGTTGGCTTTGCTT KO_PAS_chr3_1223 5′HA F 306 CGGTTTGTGGCCCATCTATG KO_PAS_chr3_1223 5′ HA R 307CTCACTTAATCTTCTGTACTCTGAAGAAAACCGACGCTTGAACTCC KO_PAS_chr3_1223 3′ HA F308 AGAAGTTGATTGAGACTTTCAACGAGAAGTCTTGACCGGAAGCAAC KO_PAS_chr3_1223 3′HA R 309 GGGCCTTAACAAACACCACA KO_PAS_chr2-1_0597 5′ HA F 310TAGAGGCGGAAAGGAACGAG KO_PAS_chr2-1_0597 5′ HA R 311CTCACTTAATCTTCTGTACTCTGAAGTTGCCAAGGGTGTACAAAGC KO_PAS_chr2-1_0597 3′HA F 312 AGAAGTTGATTGAGACTTTCAACGAGACCAAGTTGTTCGACGAAGAKO_PAS_chr2-1_0597 3′ HA R 313 CAACACATACCAGGCGAAGGKO_PAS_chr1-1_0327 5′ HA F 314 CCCTCCTCCGCCATCATTATKO_PAS_chr1-1_0327 5′ HA R 315CTCACTTAATCTTCTGTACTCTGAAGTAGGAGACAACCAAGCCAGC KO_PAS_chr1-1_0327 3′HA F 316 AGAAGTTGATTGAGACTTTCAACGAGGGAGTAGAAAATGGTGCGTCCKO_PAS_chr1-1_0327 3′ HA R 317 AATGGCTCCAAATCACAGGCKO_PAS_chr2-2_0380 5′ HA F 318 GCTTTGAGGAATGCGTGAAGAKO_PAS_chr2-2_0380 5′ HA R 319CTCACTTAATCTTCTGTACTCTGAAGGTAGTGAGAGTGGCGCCTTA KO_PAS_chr2-2_0380 3′HA F 320 AGAAGTTGATTGAGACTTTCAACGAGTGGGTACAACGTGACTCTAGGKO_PAS_chr2-2_0380 3′ HA R 321 ACACTCTTAAGGCTCGTCGT KO_PAS_chr3_0928 5′HA F 322 CTCCTCCACTTCAGTATCCGT KO_PAS_chr3_0928 5′ HA R 323CTCACTTAATCTTCTGTACTCTGAAGTTCCTTGAATTTCCGCCACC KO_PAS_chr3_0928 3′ HA F324 AGAAGTTGATTGAGACTTTCAACGAGGAGCAGGCAAGGTTGGATTC KO_PAS_chr3_0928 3′HA R 325 CTGGGCAGCAAATAACGGTT PAS_chr1-3_0184 5′ HA F 326CCAAAGTTGGCTCCGAGTAG PAS_chr1-3_0184 5′ HA R 327CTCACTTAATCTTCTGTACTCTGAAGCCTAACGGTATCGGCTTTGA PAS_chr1-3_0184 3′ HA F328 AGAAGTTGATTGAGACTTTCAACGAGGGCAAAATCCTTTTCCATGA PAS_chr1-3_0184 3′HA R 329 GAAGAAGGCCAAGTGTGATA KO_PAS_chr1-4_0289 5′ HA F 330GACGAGACGCTGTTCCTTTC KO_PAS_chr1-4_0289 5′ HA R 331CTCACTTAATCTTCTGTACTCTGAAGTGTGAAGAGAGGCCACCATT KO_PAS_chr1-4_0289 3′HA F 332 AGAAGTTGATTGAGACTTTCAACGAGTGATCGACTACTTGGCCTCCKO_PAS_chr1-4_0289 3′ HA R 333 AACAACATTCAAGCTGCCGT

TABLE 8 Forward and reverse primers for amplifying modified sequencesDescription SEQ ID NO: Sequence (5′ to 3′)KO_PAS_chr3_1087 Verification F 334 ATCGGCAAAGATGAAGCGACKO_PAS_chr3_1087 Verification R 335 GCTGGACACTTCTGAGCTCAKO_PAS_chr4_0584 Verification F 336 ACTTGTCAGGACGATACGGAKO_PAS_chr4_0584 Verification R 337 CCGGTCTCCCTGGAAATAGAKO_PAS_chr3_0076 Verification F 338 GCGAGGTCCTTGTCAATGAGKO_PAS_chr3_0076 Verification R 339 ACAAGAACTCGGGCTCCTTTKO_PAS_chr3_0691 Verification F 340 TTGCAGCGCTCCATAATGTCKO_PAS_chr3_0691 Verification R 341 GCTGATTCTGAGAACGCTGGKO_PAS_chr3_0303 Verification F 342 GCCATTCTTCGGTGCAGTAGKO_PAS_chr3_0303 Verification R 343 TAGAGTTGTCCCAAACGGCAKO_PAS_chr3_0815 Verification F 344 CGTGGTTCTCGAGGCTCTATKO_PAS_chr3_0815 Verification R 345 GGAGTTGGAACGTCGTAGGAKO_PAS_chr3_1157 Verification F 346 AGTTGTCCGTCATTAGCCCTKO_PAS_chr3_1157 Verification R 347 TGTTCCCTTTCGGCTAGACAKO_PAS_chr1-4_0164 Verification F 348 ACGGTTGAGGGCATTACGTAKO_PAS_chr1-4_0164 Verification R 349 TTGTCTTCCACCCCTTCGTTKO_PAS_chr3_0979 Verification F 350 GGTTGGCCTTGGACATTGTTKO_PAS_chr3_0979 Verification R 351 TGCTCTTCGGTACTCATGCTKO_PAS_chr3_0803 Verification F 352 TTTGGCCATGCTGAGCTTTTKO_PAS_chr3_0803 Verification R 353 AAGCCCGATCACTTGCATTTKO_PAS_chr3_0394 Verification F 354 CACCTAATGTTTGGCACCCCKO_PAS_chr3_0394 Verification R 355 ATCCCAGACTGACATCGCAAKO_PAS_chr2-1_0366 Verification F 356 CCGCCAGAAATTCATGCCATKO_PAS_chr2-1_0366 Verification R 357 TCGTTTCACTGTACCATGCAKO_PAS_chr3_0842 Verification F 358 ACCAGTCCGCATTTTCACTGKO_PAS_chr3_0842 Verification R 359 GTGGACAGCTGCAATCGTAGKO_PAS_chr1-3_0195 Verification F 360 CAACTGGGAAGCCTGCATTTKO_PAS_chr1-3_0195 Verification R 361 CCTTGCATATCCGTTTGCCAKO_PAS_chr1-4_0052 Verification F 362 GGAGGTTCAGGAGCAGGAATKO_PAS_chr1-4_0052 Verification R 363 CGGTTTCATCTGTTGCCTCCKO_PAS_chr2-2_0057 Verification F 364 GTCGCCCATGTTCTTTCGATKO_PAS_chr2-2_0057 Verification R 365 CAAACAGGCTGGAAACCACAKO_PAS_chr1-3_0150 Verification F 366 AATCTCCACGTTCAGTTGCGKO_PAS_chr1-3_0150 Verification R 367 TCATCCCTTGAAAACCCCGAKO_PAS_chr1-3_0221 Verification F 368 TTGTGGAGGGAGATTCAGGCKO_PAS_chr1-3_0221 Verification R 369 AAGGTAAGGAACGTGCTTGCKO_PAS_FragD_0022 Verification F 370 GTTCTACTGTTCACGTGCTCTKO_PAS_FragD_0022 Verification R 371 ACCGGTTAGAATACATGCTGCKO_PAS_chr2-1_0159 Verification F 372 CGAAAAGAAGCTGGACTCCGKO_PAS_chr2-1_0159 Verification R 373 TTCCATCGTACGACCAGTGTKO_PAS_chr2-1_0326 Verification F 374 AGCGATGAGGCCAACAGTATKO_PAS_chr2-1_0326 Verification R 375 TGTCCAGCCCAAAAGACTGAKO_PAS_chr2-2_0056 Verification F 376 CTCCTGGGGCTCGTACTAAGKO_PAS_chr2-2_0056 Verification R 377 CCTCAATAACGACGGCCTTGKO_PAS_chr1-4_0611 Verification F 378 CCTTTTCCTGATCAGTGGGGKO_PAS_chr1-4_0611 Verification R 379 TGTTGGGGAATGAAACACGAKO_PAS_chr1-1_0274 Verification F 380 GAAGGACGAGTAGGGTTGCTKO_PAS_chr1-1_0274 Verification R 381 TCCTGATCTGGCTCGTTTGTKO_PAS_chr4_0834 Verification F 382 ACCTCCAACTCCTGAAAGCAKO_PAS_chr4_0834 Verification R 383 CCTCGAGTCTGGGCTTTACAKO_PAS_chr3_0896 Verification F 384 GGAGAGATGCCAGACCAAGTKO_PAS_chr3_0896 Verification R 385 AGCCTGTTCTACTGCATACGTKO_PAS_chr3_0561 Verification F 386 CCATTTCTTGTACCCTGGGCKO_PAS_chr3_0561 Verification R 387 GCAGAAAAGGCGCGAATTTCKO_PAS_chr3_0633 Verification F 388 GGGAAAGGATGTGGACCAACKO_PAS_chr3_0633 Verification R 389 TGGCCAAGAGTGTCCAATTGKO_PAS_chr4_0013 Verification F 390 TAACAGATGGCGCACGTAGAKO_PAS_chr4_0013 Verification R 391 CCTTGCGTTCCCAGGTAAAGKO_PAS_chr1-1_0379 Verification F 392 TGTGGTATGGTTTGGGGCTAKO_PAS_chr1-1_0379 Verification R 393 ACTCCCGTTCCTCCATGTTCKO_PAS_chr2-1_0172 Verification F 394 ACGGTACAAAAGGCGTTTCAKO_PAS_chr2-1_0172 Verification R 395 AGTCAAACTCGGTGGTAGGTKO_PAS_chr3_0866 Verification F 396 CGGTTATCATGTGCCTGCTCKO_PAS_chr3_0866 Verification R 397 ATGTTGCTGCTCCGAAATCCKO_PAS_chr3_0299 Verification F 398 GATCTGCTGGCCTTGAGAGTKO_PAS_chr3_0299 Verification R 399 CTATGTCCTGGTGTTTGCCGKO_PAS_chr1-4_0251 Verification F 400 GCCAATGATGATCTCGCAGGKO_PAS_chr1-4_0251 Verification R 401 GCCTTTGATATGCCGTCGTTKO_PAS_chr4_0874 Verification F 402 TCGAGTAATGCTTCCCACCAKO_PAS_chr4_0874 Verification R 403 AGCTTTCACAACAGCGATCGKO_PAS_chr3_0513 Verification F 404 TGATTGCTTCTGGGTTGCTGKO_PAS_chr3_0513 Verification R 405 CAAAACCGGCGTAAAATGGCKO_PAS_chr1-1_0127 Verification F 406 TTGTGCTGCATCTGTGTGAGKO_PAS_chr1-1_0127 Verification R 407 AGCCTACAAGTGGTTACAGGTKO_PAS_chr4_0686 Verification F 408 GGAAACCGACCAGCCTAAAGKO_PAS_chr4_0686 Verification R 409 AGTCGCACCAGGTTATCACAKO_PAS_chr2-2_0159 Verification F 410 GGAAAGCTGCCCAGAAACTCKO_PAS_chr2-2_0159 Verification R 411 TGAGAGGATTCGTTGTGGCTKO_PAS_chr3_0388 Verification F 412 CTATGTCGAAGTAGCGGTGCKO_PAS_chr3_0388 Verification R 413 AGAGTGGCACTGCTATCGAAKO_PAS_chr3_0419 Verification F 414 CGTACAAACTTGGCAGCTGTKO_PAS_chr3_0419 Verification R 415 GCTGTGTTGTAAATTCCGGCKO_PAS_chr1-3_0258 Verification F 416 ACAACCCGGAAGACAACTCTKO_PAS_chr1-3_0258 Verification R 417 TGTCGTTGCCTTCCCGATATKO_PAS_chr4_0913 Verification F 418 GAAGATGGGAGAGGGTGCTTKO_PAS_chr4_0913 Verification R 419 CTTGTTGACGACGGTAGCAGKO_PAS_chr1-1_0066 Verification F 420 CCCTAGTCTCGTTCGAAGGGKO_PAS_chr1-1_0066 Verification R 421 GGCACAGCAGGTTTTCGTATKO_PAS_chr2-2_0310 Verification F 422 GGAGATTCTGATGCTACCCCAKO_PAS_chr2-2_0310 Verification R 423 TGGAGCCATCAGATCAGGACKO_PAS_chr1-3_0261 Verification F 424 CCTGTTCTTGCAAGCCTTCAKO_PAS_chr1-3_0261 Verification R 425 TAAGACATGCGACCACCAGAKO_PAS_chr2-1_0546 Verification F 426 CATGGCCAATGTCGAACTGTKO_PAS_chr2-1_0546 Verification R 427 AGCTGGCTGAAAAGGTGTTGKO_PAS_chr2-2_0398 Verification F 428 CTCAGTGTTGGAAAGCACCCKO_PAS_chr2-2_0398 Verification R 429 TAGGGAATCTTTGGTGGCGTKO_PAS_chr4_0835 Verification F 430 GGAACCTAGAGCGAGCAACAKO_PAS_chr4_0835 Verification R 431 CAGGCTCTATTGTCGACGTGKO_PAS_chr1-1_0491 Verification F 432 GGAGGTGATGACAATGCCACKO_PAS_chr1-1_0491 Verification R 433 CTGTGAAGCTCCTCCTACGTKO_PAS_chr2-1_0447 Verification F 434 GGACACTGCTGGACAAGAGAKO_PAS_chr2-1_0447 Verification R 435 TACTGACGCCGAAGAGCTAGKO_PAS_chr1-3_0053 Verification F 436 CCGATCGCAAAATAGTGGCAKO_PAS_chr1-3_0053 Verification R 437 GTTGTGGTTGTATGCGGTCAKO_PAS_chr3_0200 Verification F 438 CAATAACTCCACTGGTGCCGKO_PAS_chr3_0200 Verification R 439 TCGTTATACTCCAGCGTGCTKO_PAS_chr1-3_0105 Verification F 440 GGGCTCAAAATCTGGAACCAKO_PAS_chr1-3_0105 Verification R 441 CAATGCAGTACTCACCGGTGKO_PAS_chr3_0635 Verification F 442 AAGCTGACGACCCCTTAGACKO_PAS_chr3_0635 Verification R 443 CTATCGTGTCTGGGCTGCTAKO_PAS_chr4_0503 Verification F 444 AAGGAGATTGCCGCAACTCTKO_PAS_chr4_0503 Verification R 445 GTGGAGTCAGAGTCGAGAGGKO_PAS_chr2-1_0569 Verification F 446 CCCAGCTTTTATACGGCTTGGKO_PAS_chr2-1_0569 Verification R 447 CAGCAAAAGCTCGTGATCCAKO_PAS_chr3_1223 Verification F 448 TGCGGGTAGTCGATTGATGTKO_PAS_chr3_1223 Verification R 449 TCACGTATCTCAGCAACAGGAKO_PAS_chr2-1_0597 Verification F 450 GGACCTAGGAAATACGCCCAKO_PAS_chr2-1_0597 Verification R 451 ACTCCAGTTCCACAAGTCCAKO_PAS_chr1-1_0327 Verification F 452 ACTGCCAACCGTTTACTCCAKO_PAS_chr1-1_0327 Verification R 453 GCGCGGAAGATTAAAGTCGTKO_PAS_chr2-2_0380 Verification F 454 TTGGACTCGATCGATGAGGGKO_PAS_chr2-2_0380 Verification R 455 TGATGACTTCCAAGATGCGCKO_PAS_chr3_0928 Verification F 456 TCACCTGGAGCAACTGATGTKO_PAS_chr3_0928 Verification R 457 GTTTGGTACGCTTGTAGGCCPAS_chr1-3_0184 Verification F 458 GATGAGCAAGCATCCATTCAPAS_chr1-3_0184 Verification R 459 AAAGACAGGAGCGTGAGCATKO_PAS_chr1-4_0289 Verification F 460 CTCAACTTCGCTTGCCCTTTKO_PAS_chr1-4_0289 Verification R 461 TGGGAAACAGAACGATGAACT

TABLE 9 18B Vector SEQ ID Description NO: 5′ to 3′ Sequence18B silk-like 462ggtggttacg gtccaggcgc tggtcaacaa ggtccaggaa gtggtggtca acaaggacct 60polypeptideggcggtcaag gaccctacgg tagtggccaa caaggtccag gtggagcagg acagcagggt 120encodingccgggaggcc aaggacctta cggaccaggt gctgctgctg ccgccgctgc cgctgccgga 180sequenceggttacggtc caggagccgg acaacagggt ccaggtggag ctggacaaca aggtccagga 240tcacaaggtc ctggtggaca aggtccatac ggtcctggtg ctggtcaaca gggaccaggt 300agtcaaggac ctggttcagg tggtcagcag ggtccaggag gacagggtcc ttacggccct 360tctgccgctg cagcagcagc cgctgccgca ggaggatacg gacctggtgc tggacaacga 420tctcaaggac caggaggaca aggtccttat ggacctggcg ctggccaaca aggacctggt 480tctcagggtc caggttcagg aggccaacaa ggcccaggag gtcaaggacc atacggacca 540tccgctgcgg cagctgcagc tgctgcaggt ggatatggcc caggagccgg acaacagggt 600cctggttcac aaggtccagg atctggtggt caacagggac caggcggcca gggaccttat 660ggtccaggag ccgctgcagc agcagcagct gttggaggtt acggccctgg tgccggtcaa 720caaggcccag gatctcaggg tcctggatct ggaggacaac aaggtcctgg aggtcagggt 780ccatacggac cttcagcagc agctgctgct gcagccgctg gtggttatgg acctggtgct 840ggtcaacaag gaccgggttc tcagggtccg ggttcaggag gtcagcaggg ccctggtgga 900caaggacctt atggacctag tgcggctgca gcagctgccg ccgcaggtgg ttacggtcca 960ggcgctggtc aacaaggtcc aggaagtggt ggtcaacaag gacctggcgg tcaaggaccc 1020tacggtagtg gccaacaagg tccaggtgga gcaggacagc agggtccggg aggccaagga 1080ccttacggac caggtgctgc tgctgccgcc gctgccgctg ccggaggtta cggtccagga 1140gccggacaac agggtccagg tggagctgga caacaaggtc caggatcaca aggtcctggt 1200ggacaaggtc catacggtcc tggtgctggt caacagggac caggtagtca aggacctggt 1260tcaggtggtc agcagggtcc aggaggacag ggtccttacg gcccttctgc cgctgcagca 1320gcagccgctg ccgcaggagg atacggacct ggtgctggac aacgatctca aggaccagga 1380ggacaaggtc cttatggacc tggcgctggc caacaaggac ctggttctca gggtccaggt 1440tcaggaggcc aacaaggccc aggaggtcaa ggaccatacg gaccatccgc tgcggcagct 1500gcagctgctg caggtggata tggcccagga gccggacaac agggtcctgg ttcacaaggt 1560ccaggatctg gtggtcaaca gggaccaggc ggccagggac cttatggtcc aggagccgct 1620gcagcagcag cagctgttgg aggttacggc cctggtgccg gtcaacaagg cccaggatct 1680cagggtcctg gatctggagg acaacaaggt cctggaggtc agggtccata cggaccttca 1740gcagcagctg ctgctgcagc cgctggtggt tatggacctg gtgctggtca acaaggaccg 1800ggttctcagg gtccgggttc aggaggtcag cagggccctg gtggacaagg accttatgga 1860cctagtgcgg ctgcagcagc tgccgccgca ggtggttacg gtccaggcgc tggtcaacaa 1920ggtccaggaa gtggtggtca acaaggacct ggcggtcaag gaccctacgg tagtggccaa 1980caaggtccag gtggagcagg acagcagggt ccgggaggcc aaggacctta cggaccaggt 2040gctgctgctg ccgccgctgc cgctgccgga ggttacggtc caggagccgg acaacagggt 2100ccaggtggag ctggacaaca aggtccagga tcacaaggtc ctggtggaca aggtccatac 2160ggtcctggtg ctggtcaaca gggaccaggt agtcaaggac ctggttcagg tggtcagcag 2220ggtccaggag gacagggtcc ttacggccct tctgccgctg cagcagcagc cgctgccgca 2280ggaggatacg gacctggtgc tggacaacga tctcaaggac caggaggaca aggtccttat 2340ggacctggcg ctggccaaca aggacctggt tctcagggtc caggttcagg aggccaacaa 2400ggcccaggag gtcaaggacc atacggacca tccgctgcgg cagctgcagc tgctgcaggt 2460ggatatggcc caggagccgg acaacagggt cctggttcac aaggtccagg atctggtggt 2520caacagggac caggcggcca gggaccttat ggtccaggag ccgctgcagc agcagcagct 2580gttggaggtt acggccctgg tgccggtcaa caaggcccag gatctcaggg tcctggatct 2640ggaggacaac aaggtcctgg aggtcagggt ccatacggac cttcagcagc agctgctgct 2700gcagccgctg gtggttatgg acctggtgct ggtcaacaag gaccgggttc tcagggtccg 2760ggttcaggag gtcagcaggg ccctggtgga caaggacctt atggacctag tgcggctgca 2820gcagctgccg ccgca 2835 18B 463GGYGPGAGQQGPGSGGQQGPGGQGPYGSGQQGPGGAGQQGPGGQGPYGPGAAAAAAAAAG polypeptideGYGPGAGQQGPGGAGQQGPGSQGPGGQGPYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGP sequenceSAAAAAAAAAGGYGPGAGQRSQGPGGQGPYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPGAAAAAAAVGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGGYGPGAGQQGPGSGGQQGPGGQGPYGSGQQGPGGAGQQGPGGQGPYGPGAAAAAAAAAGGYGPGAGQQGPGGAGQQGPGSQGPGGQGPYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAAGGYGPGAGQRSQGPGGQGPYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPGAAAAAAAVGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGGYGPGAGQQGPGSGGQQGPGGQGPYGSGQQGPGGAGQQGPGGQGPYGPGAAAAAAAAAGGYGPGAGQQGPGGAGQQGPGSQGPGGQGPYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAAGGYGPGAGQRSQGPGGQGPYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPGAAAAAAAVGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAA Repeat 464GGYGPGAGQQGPGSGGQQGPGGQGPYGSGQQGPGGAGQQGPGGQGPYGPGAAAAAAAAAGsequence of aGYGPGAGQQGPGGAGQQGPGSQGPGGQGPYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGP silk-likeSAAAAAAAAAGGYGPGAGQRSQGPGGQGPYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGP polypeptideSAAAAAAAAGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPGAAAAAAAVGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAAGGYGPGAGQQGPGSQGPGSGGQQGPGGQGPYGPSAAAAAAAA

TABLE 10 Zeocin Cassette with HA arms for KU70 deletion in P. pastorisSEQ ID Description NO: 5′ to 3′ Sequence Plasmid 465ggagttgaatcacatcttactggatagcgagctttttgacgaagtgaaaatttctaattttaaacaagaggaaggggtcasequenceaaaacggagatatcttatacttggaaaaagagatgacaatcagtgatttcatcaattttgtatctagttggccttctgtgttttcgtggaagcagcaacgaggaaaggagggtatcctagatgatttttacaacgaactgaacgactgctttgaggggggtaacatgaaagtaatatggaactccgtcctagtatttgccaggaggaagcaaagggttgtataggctttagtacttatagaggaaacggggttacgtgcaagcgcgcatgcctgagctttgaggggggggactttcacatctcttcttctcacacttagccctaacacagagaataataaaaagcattgcaagatgagtgttgtcagcaagcaatacgacatccacgaaggcattatctttgtaattgaattgaccccggagcttcacgcgccggcttcagaagggaaatctcagctccagatcatcttagagaatgtcagtgaggttatttctgagctaatcattaccttgcccggtacaggaatagggtgttaccttattaattacgacggtggtcaaaacgacgaaatttaccccatttttgagttacaagacctgaatttggaaatgatgaaacaattgtaccaagtcttggaggaccatgtaagtgggcttaatcctctcgagaagcaattcccaattgaacacagtaaaccgttatcagccactctgttctttcacttaaggtctcttttttacatggcgaagactcataagcgtactggaagacattacaacttgaaaaagattttcttgttcactaataacgataaaccttacaatggaaactctcagctgagagttcccttgaagaaaaccctggctgattacaatgacgtagacattactttgattccgtttcttctgaacaagccttcaggtgtcaagtttgacaagacggaatactcagaaattttgttctatgataaagatgcttgttcgatgtcaattgaggagatccgccaacgaatttctagacataaggagatcaagcgggtttacttcacctgtcctttgaaaatcgcaaataacttgtgcatttctgtgaaaggttattctatgttttatcatgaaactccaaggaagatcaaatttgtcgtcaatgagggttcaactttcaaagatgtggagacaaaatctcagtttgtcgatccaacatccggaaaagagttttccagtgaacagctgatcaaagcatatcctctaggtgccgatgcttacattcctttaaactcagagcaagtcaaaacaataaatcgatttaatgatatcatcaatatcccctctttggaaattctaggtttcagggatatatctaattggttgccacagtatcagtttggcaaagcatcgtttttatcccctaataactatggtgattttacacattcgcagagaacatttagttgtcttcagtaatgtcttgtttcttttgttgcagtggtgagccattttgacttcgtgaaagtttctttagaatagttgtttccagaggccaaacattccacccgtagtaaagtgcaagcgtaggaagaccaagactggcataaatcaggtataagtgtcgagcactggcaggtgatcttctgaaagtttctactagcagataagatccagtagtcatgcatatggcaacaatgtaccgtgtggatctaagaacgcgtcctactaaccttcgcattcgttggtccagtttgttgttatcgatcaacgtgacaaggttgtcgattccgcgtaagcatgcatacccaaggacgcctgttgcaattccaagtgagccagttccaacaatctttgtaatattagagcacttcattgtgttgcgcttgaaagtaaaatgcgaacaaattaagagataatctcgaaaccgcgacttcaaacgccaatatgatgtgcggcacacaataagcgttcatatccgctgggtgactttctcgctttaaaaaattatccgaaaaaatttttgacggctagctcagtcctaggtacgctagcattaaagaggagaaaatggctaaactgacctctgctgttccggttctgaccgctcgtgacgttgctggtgctgttgagttctggaccgaccgtctgggtttctctcgtgacttcgttgaagacgacttcgctggtgttgttcgtgacgacgttaccctgttcatctctgctgttcaggaccaggttgttccggacaacaccctggcttgggtttgggttcgtggtctggacgaactgtacgctgaatggtctgaagttgtttctaccaacttccgtgacgcttctggtccggctatgaccgaaatcggtgaacagccgtggggtcgtgagttcgctctgcgtgacccggctggtaactgcgttcacttcgttgctgaagaacaggactaacacgtccgacggcggcccacgggtcccaggcctcggagatccgtcccccttttcctttgtcgatatcatgtaattagttatgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgcaagctgtattagtttcacttttcagcaacctggtcggaaagatccacatcaagaatggataccaaccccaagagtatgaaaatccttccctacaatggcacttcaaaatgttacgtgacgattaccttcaattggaacacgatatcgacatcagtgacccccttgagaaacaaaagtacataaacagcctcgatgagacaaaaaccaagatcatgaaactacgggactatgtcaaggaaactgccgatgatgacgacccttcacggcttgccaacactctcaaagagctcaaccaagagctgaacaaaatttccaactttgatatcatcgccaataagaagccaaagacccccacgacagtagaccctgttcctactgatgatgacatcatcaacgcctggaaggcaggaactctgaacggtttcaaggtggatcaattacgaaaatacgtaaggtcacgaaacaactttctggagacggcctccaaaaaggcagatctcatcgccaacattgacaagtactttcagcagaagttcaaagagactaaggcctgattcgtgttccttactttttcctcgcaacgtgtttttttcccaccacattgcctatgttgtaatgcaatgcagatgctggcccagtttttgacgattctcgaaaattggcattttcgtcgatgccattggccaaactgaaaattcaagacaaaatagattggattttatctgcaacgtcttccacctacacaaccactctacaaacttcagacaaacatgtttataaaagcagctactagatccaaaatgacaagttcgttattctctactacgtttgttgtggcatttggattggtggctagcaacaacctcttgccatgtcctgttgaccactctatgaataacgagactccgcaagaattgaaaccattgcaggctgaatcttctactagaaagttgaactcttccgcttaagtcaaataaaactactgacacagatgatgcacagaaacaacggatcacgctcttgactgattagtcccgtcattttggttctcattttcttcacagtcacctatcaatgtatgatcacctggaaggatttccctacgatacttcaaatcttttacttgataatattactcattatggctcaggaatgcagactgcctgattcaagacgctgctcttcttatttaacacttgtacactaaccccatggaagccagggaagggaataaccatctctctggtaataaatcggtctttatttatgcatagaaaaggaatctattatatttcgttcatttggcactctgctaactgtagattaacgggtctcgtaaattcaaaatcttcttccgatcaaaccggggtgaaatattacttctcgtgcatagctaattttcaaataaccgtcctaaaatgaacggtcatttacctggactctcttgccaaatgggcaacaaaacataaagctgatcagaacgtaactagtctctcggaatccat HA F 466 ggagttgaatcacatcttactg KU70 HA 1 467gacaactaaatgttctctgcgaatgtgtaaaatcaccatagttattaggggataaaaacgatgctttgccaaactgatactgtggcaaccaattagatatatccctgaaacctagaatttccaaagaggggatattgatgatatcattaaatcgatttattgttttgacttgctctgagtttaaaggaatgtaagcatcggcacctagaggatatgctttgatcagctgttcactggaaaactcttttccggatgttggatcgacaaactgagattttgtctccacatctttgaaagttgaaccctcattgacgacaaatttgatcttccttggagtttcatgataaaacatagaataacctttcacagaaatgcacaagttatttgcgattttcaaaggacaggtgaagtaaacccgcttgatctccttatgtctagaaattcgttggcggatctcctcaattgacatcgaacaagcatctttatcatagaacaaaatttctgagtattccgtcttgtcaaacttgacacctgaaggcttgttcagaagaaacggaatcaaagtaatgtctacgtcattgtaatcagccagggttttcttcaagggaactctcagctgagagtttccattgtaaggtttatcgttattagtgaacaagaaaatctttttcaagttgtaatgtcttccagtacgcttatgagtcttcgccatgtaaaaaagagaccttaagtgaaagaacagagtggctgataacggtttactgtgttcaattgggaattgcttctcgagaggattaagcccacttacatggtcctccaagacttggtacaattgtttcatcatttccaaattcaggtcttgtaactcaaaaatggggtaaatttcgtcgttttgaccaccgtcgtaattaataaggtaacaccctattcctgtaccgggcaaggtaatgattagctcagaaataacctcactgacattctctaagatgatctggagctgagatttcccttctgaagccggcgcgtgaagctccggggtcaattcaattacaaagataatgccttcgtggatgtcgtattgcttgctgacaacactcat KU70 HA 2468tcaggccttagtctctttgaacttctgctgaaagtacttgtcaatgttggcgatgagatctgcctttttggaggccgtctccagaaagttgtttcgtgaccttacgtattttcgtaattgatccaccttgaaaccgttcagagttcctgccttccaggcgttgatgatgtcatcatcagtaggaacagggtctactgtcgtgggggtctttggcttcttattggcgatgatatcaaagttggaaattttgttcagctcttggttgagctctttgagagtgttggcaagccgtgaagggtcgtcatcatcggcagtttccttgacatagtcccgtagtttcatgatcttggtttttgtctcatcgaggctgtttatgtacttttgtttctcaagggggtcactgatgtcgatatcgtgttccaattgaaggtaatcgtcacgtaacattttgaagtgccattgtagggaaggattttcatactcttggggttggtatccattcttgatgtggatctttccgaccaggttgctgaaaagtgaaactaatacpILV5 469ttcagtaatgtcttgtttcttttgttgcagtggtgagccattttgacttcgtgaaagtttctttagaatagttgtttccagaggccaaacattccacccgtagtaaagtgcaagcgtaggaagaccaagactggcataaatcaggtataagtgtcgagcactggcaggtgatcttctgaaagtttctactagcagataagatccagtagtcatgcatatggcaacaatgtaccgtgtggatctaagaacgcgtcctactaaccttcgcattcgttggtccagtttgttgttatcgatcaacgtgacaaggttgtcgattccgcgtaagcatgcatacccaaggacgcctgttgcaattccaagtgagccagttccaacaatctttgtaatattagagcacttcattgtgttgcgcttgaaagtaaaatgcgaacaaattaagagataatctcgaaaccgcgacttcaaacgccaatatgatgtgcggcacacaataagcgttcatatccgctgggtgactttctcgctttaaaaaattatccgaaaaaatttRM2734; testR 470 cagaggccaaacattccacc pproRBS 471 ttaaagaggagaaaSh ble (codon 472atggctaaactgacctctgctgttccggttctgaccgctcgtgacgttgctggtgctgttgagttctggaccgaccgtctoptimized)gggtttctctcgtgacttcgttgaagacgacttcgctggtgttgttcgtgacgacgttaccctgttcatctctgctgttcaggaccaggttgttccggacaacaccctggcttgggtttgggttcgtggtctggacgaactgtacgctgaatggtctgaagttgtttctaccaacttccgtgacgcttctggtccggctatgaccgaaatcggtgaacagccgtggggtcgtgagttcgctctgcgtgacccggctggtaactgcgttcacttcgttgctgaagaacaggactaa CYC1 473cacgtccgacggcggcccacgggtcccaggcctcggagatccgtcccccttttcctttgtcgatatcatgtaattagttaterminatortgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgcaagctRm3386; F test 474 aggagttagacaacctgaag oligo HA R 475gtaactagtctctcggaatccat

TABLE 11 Nourseothricin Cassette for protease deletion in P. pastorisSEQ ID Description NO: 5′ to 3′ Sequence Plasmid 476cttcagagtacagaagattaagtgagagaattctaccgttcgtatagcatacattatacgaagttatttcagtaatgtctsequencetgtttcttttgttgcagtggtgagccattttgacttcgtgaaagtttctttagaatagttgtttccagaggccaaacattccacccgtagtaaagtgcaagcgtaggaagaccaagactggcataaatcaggtataagtgtcgagcactggcaggtgatcttctgaaagtttctactagcagataagatccagtagtcatgcatatggcaacaatgtaccgtgtggatctaagaacgcgtcctactaaccttcgcattcgttggtccagtttgttgttatcgatcaacgtgacaaggttgtcgattccgcgtaagcatgcatacccaaggacgcctgttgcaattccaagtgagccagttccaacaatctttgtaatattagagcacttcattgtgttgcgcttgaaagtaaaatgcgaacaaattaagagataatctcgaaaccgcgacttcaaacgccaatatgatgtgcggcacacaataagcgttcatatccgctgggtgactttctcgctttaaaaaattatccgaaaaaatttttgacggctagctcagtcctaggtacgctagcattaaagaggagaaaatgactactcttgatgacacagcctacagatataggacatcagttccgggtgacgcagaggctatcgaagccttggacggttcattcactactgatacggtgtttagagtcaccgctacaggtgatggcttcaccttgagagaggttcctgtagacccacccttaacgaaagttttccctgatgacgaatcggatgacgagtctgatgctggtgaggacggtgaccctgattccagaacatttgtcgcatacggagatgatggtgacctggctggctttgttgtggtgtcctacagcggatggaatcgtagactcacagttgaggacatcgaagttgcacctgaacatcgtggtcacggtgttggtcgtgcactgatgggactggcaacagagtttgctagagaaagaggagccggacatttgtggttagaagtgaccaatgtcaacgctcctgctattcacgcatataggcgaatgggtttcactttgtgcggtcttgatactgctttgtatgacggaactgcttctgatggtgaacaagctctttacatgagtatgccatgtccatagcacgtccgacggcggcccacgggtcccaggcctcggagatccgtcccccttttcctttgtcgatatcatgtaattagttatgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgcaagctataacttcgtatagcatacattataccttgttatgcggccgcaagaagttgattgagactttcaacgag AOX1 pA 477 cttcagagtacagaagattaagtgagaterminator Lox71 F 478 taccgttcgtatagcatacattatacgaagttat pILV5 479ttcagtaatgtcttgtttcttttgttgcagtggtgagccattttgacttcgtgaaagtttctttagaatagttgtttccagaggccaaacattccacccgtagtaaagtgcaagcgtaggaagaccaagactggcataaatcaggtataagtgtcgagcactggcaggtgatcttctgaaagtttctactagcagataagatccagtagtcatgcatatggcaacaatgtaccgtgtggatctaagaacgcgtcctactaaccttcgcattcgttggtccagtttgttgttatcgatcaacgtgacaaggttgtcgattccgcgtaagcatgcatacccaaggacgcctgttgcaattccaagtgagccagttccaacaatctttgtaatattagagcacttcattgtgttgcgcttgaaagtaaaatgcgaacaaattaagagataatctcgaaaccgcgacttcaaacgccaatatgatgtgcggcacacaataagcgttcatatccgctgggtgactttctcgctttaaaaaattatccgaaaaaatttpproRBS 480 ttaaagaggagaaa nat 481atgactactcttgatgacacagcctacagatataggacatcagttccgggtgacgcagaggctatcgaagccttggacgg(Nourseothricinttcattcactactgatacggtgtttagagtcaccgctacaggtgatggcttcaccttgagagaggttcctgtagacccacresistance)ccttaacgaaagttttccctgatgacgaatcggatgacgagtctgatgctggtgaggacggtgaccctgattccagaacatttgtcgcatacggagatgatggtgacctggctggctttgttgtggtgtcctacagcggatggaatcgtagactcacagttgaggacatcgaagttgcacctgaacatcgtggtcacggtgttggtcgtgcactgatgggactggcaacagagtttgctagagaaagaggagccggacatttgtggttagaagtgaccaatgtcaacgctcctgctattcacgcatataggcgaatgggtttcactttgtgcggtcttgatactgctttgtatgacggaactgcttctgatggtgaacaagctctttacatgagtatgccatgtccatag CYC1 482cacgtccgacggcggcccacgggtcccaggcctcggagatccgtcccccttttcctttgtcgatatcatgtaattagttaterminatortgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgcaagctLoxKR3 F 483 ataacttcgtatagcatacattataccttgttat HSP82 484gcggccgcaagaagttgattgagactttcaacgag

TABLE 12Exemplary nourseothricin cassettes with HA arms for protease deletion in P. pastorisSEQ ID Description NO: 5′ to 3′ Sequence Nourseothricin 485tactacaggctggctgttcctcgcatggtgtttaatgtcctgactgggttttcgtttatcggtattaccggagccaccttgcassette withactgtaagggaacgatactggactaagagagtaatgcgaaaggcaacagcgtttctggcgaacctaatcaatgacggttachomology armsgagtttactactcctaaagccagtcttattttgctagagcgagtcaacgcttacttaaagggccagggacctaattatgactargetingatcgattttgacgagcaggaggcgttcattaaagaaatggaggagttgaggacctctggtggatatgagaacagatactcaPAS_chr4_0584tattcaggaaccgatgaaacacccagagatccgggttgcctgtttcttcccattgctttaaataaatggcactttgatgtgctagactgcctgaggatatacggtactcaggaagatctggaatctaaattattaagtgttcagcaattggtgttacaatgttgcatgaagcacagtggcatgactccagacatggtctttgcaacggaagtagctcagaagccgaccttcgaagacgacatagtttgtgatgatattgacgcttatgcccaggggggtgattgtctagattattgttacacgccaagcaattactccagaactttagaaattcatggcaagattgctaccttacaacgagagctggggctatgctataatattctcggaattttggaccgtttttccgattaaggtttttagctccattgcgccaacccccgctctccagactccttcgttatccagcattcagcatggacaggttcaaaaaataaaatttcttgatatgggtccacttcaaacatgcgcctacctgtaggaaaaaaaaagagaacataaatatgccgcgaacagaaaacgtaatgtactgttctatatataaactgttcagatcaatcataaattctcagtttcaaactttccgctcagccagattttattcgtaaagaacgcatcattggctctatgttgaaggatcagttcttgttatgggttgctttgatagcgagcgtaccggtttccggcgtgatggcagctcctagcgagtccgggcataacacggttgaaaaacgagatgccaaaaacgttgttggcgttcaacagttggacttcttcagagtacagaagattaagtgagagaattctaccgttcgtatagcatacattatacgaagttatttcagtaatgtcttgtttcttttgttgcagtggtgagccattttgacttcgtgaaagtttctttagaatagttgtttccagaggccaaacattccacccgtagtaaagtgcaagcgtaggaagaccaagactggcataaatcaggtataagtgtcgagcactggcaggtgatcttctgaaagtttctactagcagataagatccagtagtcatgcatatggcaacaatgtaccgtgtggatctaagaacgcgtcctactaaccttcgcattcgttggtccagtttgttgttatcgatcaacgtgacaaggttgtcgattccgcgtaagcatgcatacccaaggacgcctgttgcaattccaagtgagccagttccaacaatctttgtaatattagagcacttcattgtgttgcgcttgaaagtaaaatgcgaacaaattaagagataatctcgaaaccgcgacttcaaacgccaatatgatgtgcggcacacaataagcgttcatatccgctgggtgactttctcgctttaaaaaattatccgaaaaaatttttgacggctagctcagtcctaggtacgctagcattaaagaggagaaaatgactactcttgatgacacagcctacagatataggacatcagttccgggtgacgcagaggctatcgaagccttggacggttcattcactactgatacggtgtttagagtcaccgctacaggtgatggcttcaccttgagagaggttcctgtagacccacccttaacgaaagttttccctgatgacgaatcggatgacgagtctgatgctggtgaggacggtgaccctgattccagaacatttgtcgcatacggagatgatggtgacctggctggctttgttgtggtgtcctacagcggatggaatcgtagactcacagttgaggacatcgaagttgcacctgaacatcgtggtcacggtgttggtcgtgcactgatgggactggcaacagagtttgctagagaaagaggagccggacatttgtggttagaagtgaccaatgtcaacgctcctgctattcacgcatataggcgaatgggtttcactttgtgcggtcttgatactgctttgtatgacggaactgcttctgatggtgaacaagctctttacatgagtatgccatgtccatagcacgtccgacggcggcccacgggtcccaggcctcggagatccgtcccccttttcctttgtcgatatcatgtaattagttatgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgcaagctataacttcgtatagcatacattataccttgttatgcggccgcaagaagttgattgagactttcaacgagggtccccttcagctacctttctctctgtttggtagttattctcggcgtgtgtatagtatagtataaaagggcctacattggataggcttcaacattcctcaataaacaaacatccaacatcgcgcattccgcatttcgcatttcacatttcgcgcctgccttcctttaggttctttgaatcatcatcaatcgtcgccgtctacatcagagcaggacttatctttgccttccccaaaaattgccactccgtcaaatagattcttttgaatccttgactatttttgcctaaataggtttttgttagtttttcttcaaagcccaaaagaaactctatttagattcatccagaaacaatctttttctcaccccatttcgaagtgccgtggagcacagacataaaaagatgactaccgttcaacctacagggccagacaggctcaccctgccgcatattctactggaattcaacgatggctcctcgcagcatgcagtgatcgagctaagcatgaacgaggggattaatatatccacccatgagtggaatccatccactaatgagcaatcgccacgggaagagagagcaccaccccaacaatccaatccatcgcatcatccagaatcatcgaacatagctactcaaagtcccgctcaggaaaccgagactcagcccggcattccaggactagataggcctgcctttgatacctcggcaacggggtcgtcagaacaggttgacccagtacagggaaggatcctggatgatattataggccaatcattaaggacttccgaagaagacgataccgaatcccgccagagaccacgagaccagaagaacattatgatcaccgtgaattacttgtacgcagacgacacaaattccagaagtgctaatacaaacaaccagacgcccaataacacttctagaacttccgacagtgaacgtgtgggctccttatcgttgcacgttccggatctaccagataatgccgacgattactatatcgatgtactcattaaactaaccacaagcattgccctcagcgtcatcacgtccatgatcaagaaacgattagggcttagcagggaPAS_chr4_0584 486tactacaggctggctgttcctcgcatggtgtttaatgtcctgactgggttttcgtttatcggtattaccggagccaccttgHomology Arm 1actgtaagggaacgatactggactaagagagtaatgcgaaaggcaacagcgtttctggcgaacctaatcaatgacggttacgagtttactactcctaaagccagtcttattttgctagagcgagtcaacgcttacttaaagggccagggacctaattatgacatcgattttgacgagcaggaggcgttcattaaagaaatggaggagttgaggacctctggtggatatgagaacagatactcatattcaggaaccgatgaaacacccagagatccgggttgcctgtttcttcccattgctttaaataaatggcactttgatgtgctagactgcctgaggatatacggtactcaggaagatctggaatctaaattattaagtgttcagcaattggtgttacaatgttgcatgaagcacagtggcatgactccagacatggtctttgcaacggaagtagctcagaagccgaccttcgaagacgacatagtttgtgatgatattgacgcttatgcccaggggggtgattgtctagattattgttacacgccaagcaattactccagaactttagaaattcatggcaagattgctaccttacaacgagagctggggctatgctataatattctcggaattttggaccgtttttccgattaaggtttttagctccattgcgccaacccccgctctccagactccttcgttatccagcattcagcatggacaggttcaaaaaataaaatttcttgatatgggtccacttcaaacatgcgcctacctgtaggaaaaaaaaagagaacataaatatgccgcgaacagaaaacgtaatgtactgttctatatataaactgttcagatcaatcataaattctcagtttcaaactttccgctcagccagattttattcgtaaagaacgcatcattggctctatgttgaaggatcagttcttgttatgggttgctttgatagcgagcgtaccggtttccggcgtgatggcagctcctagcgagtccgggcataacacggttgaaaaacgagatgccaaaaacgttgttggcgttcaacagttggactt PAS_chr4_0584 487ggtccccttcagctacctttctctctgtttggtagttattctcggcgtgtgtatagtatagtataaaagggcctacattggHomology Arm 2ataggcttcaacattcctcaataaacaaacatccaacatcgcgcattccgcatttcgcatttcacatttcgcgcctgccttcctttaggttctttgaatcatcatcaatcgtcgccgtctacatcagagcaggacttatctttgccttccccaaaaattgccactccgtcaaatagattcttttgaatccttgactatttttgcctaaataggtttttgttagtttttcttcaaagcccaaaagaaactctatttagattcatccagaaacaatctttttctcaccccatttcgaagtgccgtggagcacagacataaaaagatgactaccgttcaacctacagggccagacaggctcaccctgccgcatattctactggaattcaacgatggctcctcgcagcatgcagtgatcgagctaagcatgaacgaggggattaatatatccacccatgagtggaatccatccactaatgagcaatcgccacgggaagagagagcaccaccccaacaatccaatccatcgcatcatccagaatcatcgaacatagctactcaaagtcccgctcaggaaaccgagactcagcccggcattccaggactagataggcctgcctttgatacctcggcaacggggtcgtcagaacaggttgacccagtacagggaaggatcctggatgatattataggccaatcattaaggacttccgaagaagacgataccgaatcccgccagagaccacgagaccagaagaacattatgatcaccgtgaattacttgtacgcagacgacacaaattccagaagtgctaatacaaacaaccagacgcccaataacacttctagaacttccgacagtgaacgtgtgggctccttatcgttgcacgttccggatctaccagataatgccgacgattactatatcgatgtactcattaaactaaccacaagcattgccctcagcgtcatcacgtccatgatcaagaaacgattagggcttagcaggga Nourseothricin 488gccttctcgtgcaatcagagctgttgaaagagagaagagggcacacggaagctgctgttcaattgtgtgaattgaccggatcassette withtacaacctgctggagtgataggagagctggttcgtgacgaggacggctctatgatgcgattagacgactgtgttcagtttghomology armsgtctccgccacaacgtaaaaattatcaaccttgaccagatcattgaatacatggattccaagaacagctagatacgatggatargetingtaggaatacagagatatcatgattgaggaacgtaagagctttttcgaaagtgtgagtttgtggtgagggccaggcggtgggPAS_chr3_1157gaggtggtggggagcctccttggtcgaatgtagatatagtaagcaagacacaagagcgcgcgaagtcttcaacgaggcggcgttgggtcttgtacgcaacgtaatgactacacagttgagcttgtcgcgaaccggtcgacattttgatcatgcatactatgttgagacaccatctcgtactattgcggcaaccagctgtaaatttgactaattaaagctgatgaaggatgcagggcgtcgtcaattttttgattgattgcatttaattgtttgagccattcaaggctgaatgcccggcaccctagacccttcttgtgagtactataaacccgcaggcagggtacccttggccttctgcgagactaccagtcataacgtatatccacaatgtactagtaatagccccggaaaactctaatcccacagaacgtctaacgcctcctatgtcatcgatacccattcgcactactgccatggccccccttacgtgatcatttcacttactcccgcctaagcttcgcccacatgcctgcgttttgccaagatttactgacgagtttggtttactcatcctctatttataactactagactttcaccattcttcaccaccctcgtgccaatgatcatcaaccacttggtattgacagccctcagcattgcactagcaagtgcgcaactccaatcgcctttcacttcagagtacagaagattaagtgagagaattctaccgttcgtatagcatacattatacgaagttatttcagtaatgtcttgtttcttttgttgcagtggtgagccattttgacttcgtgaaagtttctttagaatagttgtttccagaggccaaacattccacccgtagtaaagtgcaagcgtaggaagaccaagactggcataaatcaggtataagtgtcgagcactggcaggtgatcttctgaaagtttctactagcagataagatccagtagtcatgcatatggcaacaatgtaccgtgtggatctaagaacgcgtcctactaaccttcgcattcgttggtccagtttgttgttatcgatcaacgtgacaaggttgtcgattccgcgtaagcatgcatacccaaggacgcctgttgcaattccaagtgagccagttccaacaatctttgtaatattagagcacttcattgtgttgcgcttgaaagtaaaatgcgaacaaattaagagataatctcgaaaccgcgacttcaaacgccaatatgatgtgcggcacacaataagcgttcatatccgctgggtgactttctcgctttaaaaaattatccgaaaaaatttttgacggctagctcagtcctaggtacgctagcattaaagaggagaaaatgactactcttgatgacacagcctacagatataggacatcagttccgggtgacgcagaggctatcgaagccttggacggttcattcactactgatacggtgtttagagtcaccgctacaggtgatggcttcaccttgagagaggttcctgtagacccacccttaacgaaagttttccctgatgacgaatcggatgacgagtctgatgctggtgaggacggtgaccctgattccagaacatttgtcgcatacggagatgatggtgacctggctggctttgttgtggtgtcctacagcggatggaatcgtagactcacagttgaggacatcgaagttgcacctgaacatcgtggtcacggtgttggtcgtgcactgatgggactggcaacagagtttgctagagaaagaggagccggacatttgtggttagaagtgaccaatgtcaacgctcctgctattcacgcatataggcgaatgggtttcactttgtgcggtcttgatactgctttgtatgacggaactgcttctgatggtgaacaagctctttacatgagtatgccatgtccatagcacgtccgacggcggcccacgggtcccaggcctcggagatccgtcccccttttcctttgtcgatatcatgtaattagttatgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgcaagctataacttcgtatagcatacattataccttgttatgcggccgcaagaagttgattgagactttcaacgagctggctctgcttctggtacttcttcaggtgcatcttctgctactcaaaatgacgaaacatccactgatcttggagctccagctgcatctttaagtgcaacgccatgtctttttgccatcttgctgctcatgttgtagtagactttttttttcactgagtttttatgtactactgattacattgtgtaggtgtaatgatgtgcactataatactaatatagtcaaaatgctacagaggaaagtgcaggttgcctgtggtggtttttcttattagcaccctctgaacactctttacctctaacatcctcagccatgctaatcgcgcataaaataaatcttcgaacttttttccattttatgctcataaagcttccttactgtcaccttatcaaaagagcttttgccactaaagtagtcacacccagaattgctcccgaatatcgtccaacaatgctaggatctgtggaaagtttgacaaataatttgaacaccttgagcttgaagcttcctgaagttaatatccaaggctcctttccagaaagtaacccagtggaccttttgagaaactacatcactcaagaacttagtaaaatttctggagttgacaaagaattgattttcccagccttggaatggggtaccacactggaaaaaggtgatcttttgatcccagttcctcgtctgagaataaagggtgctaatcctaaagatttagccgaacaatgggctgctgcattcccaaagggtggatatcttaaagacgttattgcgcaaggacctttcttgcagttcttttttaacacatcggttctgtacaagttggtgatatctgatgctctggagagaggcgatgactttggtgcacttcctctaggaaagggacaaaaagttatagtggagttttcttctccaaatattgccaaacctttccacgctggccatcttagaagtacaatcatcggtggttttatttccaatctgtatgaaaagctgggtcatgaagttatgaggatgaattatttgggagactggggaaaacaatttggtgttcttgcagtaggatttgagcgttacggtgatgaggcaaaattaaagactgatccaatcaaccatttgtttgaggtctatgttaaaatcaaccaagatattaaggctcaatcagagtctactgaggagattgcagaagggcaatcattagatgaccaggcaagagcttttttcaagaaaatggaaaatggcgacgaatcggctgtaagcttgtggaaaagattccgtgagttatccattgagaagtacattgatacttatgcccgcctcaacatcPAS_chr3_1157 489gccttctcgtgcaatcagagctgttgaaagagagaagagggcacacggaagctgctgttcaattgtgtgaattgaccggatHomology Arm 1tacaacctgctggagtgataggagagctggttcgtgacgaggacggctctatgatgcgattagacgactgtgttcagtttggtctccgccacaacgtaaaaattatcaaccttgaccagatcattgaatacatggattccaagaacagctagatacgatggataggaatacagagatatcatgattgaggaacgtaagagctttttcgaaagtgtgagtttgtggtgagggccaggcggtggggaggtggtggggagcctccttggtcgaatgtagatatagtaagcaagacacaagagcgcgcgaagtcttcaacgaggcggcgttgggtcttgtacgcaacgtaatgactacacagttgagcttgtcgcgaaccggtcgacattttgatcatgcatactatgttgagacaccatctcgtactattgcggcaaccagctgtaaatttgactaattaaagctgatgaaggatgcagggcgtcgtcaattttttgattgattgcatttaattgtttgagccattcaaggctgaatgcccggcaccctagacccttcttgtgagtactataaacccgcaggcagggtacccttggccttctgcgagactaccagtcataacgtatatccacaatgtactagtaatagccccggaaaactctaatcccacagaacgtctaacgcctcctatgtcatcgatacccattcgcactactgccatggccccccttacgtgatcatttcacttactcccgcctaagcttcgcccacatgcctgcgttttgccaagatttactgacgagtttggtttactcatcctctatttataactactagactttcaccattcttcaccaccctcgtgccaatgatcatcaaccacttggtattgacagccctcagcattgcactagcaagtgcgcaactccaatcgcctttca PAS_chr3_1157 490ctggctctgcttctggtacttcttcaggtgcatcttctgctactcaaaatgacgaaacatccactgatcttggagctccagHomology Arm 2ctgcatctttaagtgcaacgccatgtctttttgccatcttgctgctcatgttgtagtagactttttttttcactgagtttttatgtactactgattacattgtgtaggtgtaatgatgtgcactataatactaatatagtcaaaatgctacagaggaaagtgcaggttgcctgtggtggtttttcttattagcaccctctgaacactctttacctctaacatcctcagccatgctaatcgcgcataaaataaatcttcgaacttttttccattttatgctcataaagcttccttactgtcaccttatcaaaagagcttttgccactaaagtagtcacacccagaattgctcccgaatatcgtccaacaatgctaggatctgtggaaagtttgacaaataatttgaacaccttgagcttgaagcttcctgaagttaatatccaaggctcctttccagaaagtaacccagtggaccttttgagaaactacatcactcaagaacttagtaaaatttctggagttgacaaagaattgattttcccagccttggaatggggtaccacactggaaaaaggtgatcttttgatcccagttcctcgtctgagaataaagggtgctaatcctaaagatttagccgaacaatgggctgctgcattcccaaagggtggatatcttaaagacgttattgcgcaaggacctttcttgcagttcttttttaacacatcggttctgtacaagttggtgatatctgatgctctggagagaggcgatgactttggtgcacttcctctaggaaagggacaaaaagttatagtggagttttcttctccaaatattgccaaacctttccacgctggccatcttagaagtacaatcatcggtggttttatttccaatctgtatgaaaagctgggtcatgaagttatgaggatgaattatttgggagactggggaaaacaatttggtgttcttgcagtaggatttgagcgttacggtgatgaggcaaaattaaagactgatccaatcaaccatttgtttgaggtctatgttaaaatcaaccaagatattaaggctcaatcagagtctactgaggagattgcagaagggcaatcattagatgaccaggcaagagcttttttcaagaaaatggaaaatggcgacgaatcggctgtaagcttgtggaaaagattccgtgagttatccattgagaagtacattgatacttatgcccgcctcaacatc Nourseothricin 491gacgagacgctgttcctttcaacttgtccacttggactgacaagtcaacacctgttactaattcttttgtcatctctcagtcassette withatgaagacacgcgtgttcctcaatcagccaccagttctacacatccaaacatacctaaacacgccaaagagtatccgttaghomology armscaaatgggccacctgggtggtgttggaattcccattccagtatgtcgacagaccaaccaatatatccaggacaccaatatctargetingcaccaccgcttcagcagcactaccactttgcttcacccaggcaactatcaaactctagctctgggacgtcatccgttccttPAS_chr1-4_0289tccaaccaccccctgctggtcaattacaaccacaaggtaattctatgttcatacacatgccattttcgctaaatggcccaccagctgctggacagcaattgataccaccccaaggactagcctcaatacctgtcggccccggcaacaacagttccctattggttagccaaggtgcacctggcggctattctttagcttcaccagcgttgtcaccggtagatgcgaccttcgaagatcccgtcaagagactgcccaaaaagcggacaaaaactggatgtctcacttgccgtaagagacgaatcaaatgtgacgaacgcaagccgttctgtttcaactgtgaaaaaagcaaaaaggtgtgtactggttttacgcatctattcaaagatccccctagcaaatcctaccctcccagttcagatggtgcctcccctgttgccaatgaccaccctgtccccccaaggcaaaactttggtgaattgaggggcagtctgaattacatcatcaactagaagaatgcttattccttttctctactgtataatcacgacgttatgtcctttaatataagaaacgacaattaaaccactttaggtggacataatccatttctggatgctgttcgatgtgtagtgtctaaaccgatactgagatttctctttctctttctcttttttttttttttcctaccatttccttcaagaaaatacacctttcgacagatcatcataaatggtggcctctcttcacacttcagagtacagaagattaagtgagagaattctaccgttcgtatagcatacattatacgaagttatttcagtaatgtcttgtttcttttgttgcagtggtgagccattttgacttcgtgaaagtttctttagaatagttgtttccagaggccaaacattccacccgtagtaaagtgcaagcgtaggaagaccaagactggcataaatcaggtataagtgtcgagcactggcaggtgatcttctgaaagtttctactagcagataagatccagtagtcatgcatatggcaacaatgtaccgtgtggatctaagaacgcgtcctactaaccttcgcattcgttggtccagtttgttgttatcgatcaacgtgacaaggttgtcgattccgcgtaagcatgcatacccaaggacgcctgttgcaattccaagtgagccagttccaacaatctttgtaatattagagcacttcattgtgttgcgcttgaaagtaaaatgcgaacaaattaagagataatctcgaaaccgcgacttcaaacgccaatatgatgtgcggcacacaataagcgttcatatccgctgggtgactttctcgctttaaaaaattatccgaaaaaatttttgacggctagctcagtcctaggtacgctagcattaaagaggagaaaatgactactcttgatgacacagcctacagatataggacatcagttccgggtgacgcagaggctatcgaagccttggacggttcattcactactgatacggtgtttagagtcaccgctacaggtgatggcttcaccttgagagaggttcctgtagacccacccttaacgaaagttttccctgatgacgaatcggatgacgagtctgatgctggtgaggacggtgaccctgattccagaacatttgtcgcatacggagatgatggtgacctggctggctttgttgtggtgtcctacagcggatggaatcgtagactcacagttgaggacatcgaagttgcacctgaacatcgtggtcacggtgttggtcgtgcactgatgggactggcaacagagtttgctagagaaagaggagccggacatttgtggttagaagtgaccaatgtcaacgctcctgctattcacgcatataggcgaatgggtttcactttgtgcggtcttgatactgctttgtatgacggaactgcttctgatggtgaacaagctctttacatgagtatgccatgtccatagcacgtccgacggcggcccacgggtcccaggcctcggagatccgtcccccttttcctttgtcgatatcatgtaattagttatgtcacgcttacattcacgccctccccccacatccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctatttatttttttatagttatgttagtattaagaacgttatttatatttcaaatttttcttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgcttgagaaggttttgggacgctcgaaggctttaatttgcaagctataacttcgtatagcatacattataccttgttatgcggccgcaagaagttgattgagactttcaacgagtgatcgactacttggcctccgccgtgaaaactcaattagatgttagctccaaattaatgaacctggtacaagatgataaataggaactcaaatacaaagcctaccattaatgactgttttatttttatactaaagtagctaaagggtgattatcaaggagtggttaacgatctattcctagcagggcactcagctcatcgatctttccaatatcggcgtataacgcttccacttctatcaacgtatcttcgttaaaaagaccacctctggtgggaactaatccttctgctgccgcctctgctaaactctgtcttcgaatccgtttcttactaacatcagcttcgacagataagccactcttctttatctttttcttagatcctgttttgaatctcagggactttactggtgccataacaacttcctgttccagtaccttgttcttcttactcttttttggtattaaagaatgtcccgccttgagtcctcgatcatccttggccatactcaatcgtctagtagtgctgttgaaatgctgtaaagaagaggaatatcttcttaaatggttggtatctttttcagcaaccacacctttgtttcggaaagcggataatggcacattgcttggattgatagaagaagctataaaagcccatcctgcgtttggagcagtttgattgctctgagttactatgttcaactgtgtattggcaaaagccttagagtcgctgtctgattcgcttatattgagtaaatcatccaggtccaatagaggaacagaaccagtctgcttcccttttggttttgtacgatccctaattgcacccttcacagaaagttctacccgtttggactttatactgtctttgttctctgatactgatcgcattgaaaacccatcaataatctcaaagggtttgccacagtccgaggtggtccaaattccaatcactggagggataggatccactttggaagatgccagaacttcttttgcaattttggtaccaatttttttattggatgttttgggaagagcttcatcttcatcagtggagttgctgctttcgttgtcatctactttttggtcatcttctagttcgtcgtcgtctgaagcaatagcatctgaggaggacgcatctccttcacctttgaaaaagtaattaaataggtaggagtcatcatcagaatcttgttcttggtctgatcccctttcgacggcagcttgaatgttgtt PAS_chr1-4_0289 492gacgagacgctgttcctttcaacttgtccacttggactgacaagtcaacacctgttactaattcttttgtcatctctcagtHomology Arm 1atgaagacacgcgtgttcctcaatcagccaccagttctacacatccaaacatacctaaacacgccaaagagtatccgttagcaaatgggccacctgggtggtgttggaattcccattccagtatgtcgacagaccaaccaatatatccaggacaccaatatccaccaccgcttcagcagcactaccactttgcttcacccaggcaactatcaaactctagctctgggacgtcatccgttcctttccaaccaccccctgctggtcaattacaaccacaaggtaattctatgttcatacacatgccattttcgctaaatggcccaccagctgctggacagcaattgataccaccccaaggactagcctcaatacctgtcggccccggcaacaacagttccctattggttagccaaggtgcacctggcggctattctttagcttcaccagcgttgtcaccggtagatgcgaccttcgaagatcccgtcaagagactgcccaaaaagcggacaaaaactggatgtctcacttgccgtaagagacgaatcaaatgtgacgaacgcaagccgttctgtttcaactgtgaaaaaagcaaaaaggtgtgtactggttttacgcatctattcaaagatccccctagcaaatcctaccctcccagttcagatggtgcctcccctgttgccaatgaccaccctgtccccccaaggcaaaactttggtgaattgaggggcagtctgaattacatcatcaactagaagaatgcttattccttttctctactgtataatcacgacgttatgtcctttaatataagaaacgacaattaaaccactttaggtggacataatccatttctggatgctgttcgatgtgtagtgtctaaaccgatactgagatttctctttctctttctcttttttttttttttcctaccatttccttcaagaaaatacacctttcgacagatcatcataaatggtggcctctcttcaca PAS_chr1-4_0289 493tgatcgactacttggcctccgccgtgaaaactcaattagatgttagctccaaattaatgaacctggtacaagatgataaatHomology Arm 2aggaactcaaatacaaagcctaccattaatgactgttttatttttatactaaagtagctaaagggtgattatcaaggagtggttaacgatctattcctagcagggcactcagctcatcgatctttccaatatcggcgtataacgcttccacttctatcaacgtatcttcgttaaaaagaccacctctggtgggaactaatccttctgctgccgcctctgctaaactctgtcttcgaatccgtttcttactaacatcagcttcgacagataagccactcttctttatctttttcttagatcctgttttgaatctcagggactttactggtgccataacaacttcctgttccagtaccttgttcttcttactcttttttggtattaaagaatgtcccgccttgagtcctcgatcatccttggccatactcaatcgtctagtagtgctgttgaaatgctgtaaagaagaggaatatcttcttaaatggttggtatctttttcagcaaccacacctttgtttcggaaagcggataatggcacattgcttggattgatagaagaagctataaaagcccatcctgcgtttggagcagtttgattgctctgagttactatgttcaactgtgtattggcaaaagccttagagtcgctgtctgattcgcttatattgagtaaatcatccaggtccaatagaggaacagaaccagtctgcttcccttttggttttgtacgatccctaattgcacccttcacagaaagttctacccgtttggactttatactgtctttgttctctgatactgatcgcattgaaaacccatcaataatctcaaagggtttgccacagtccgaggtggtccaaattccaatcactggagggataggatccactttggaagatgccagaacttcttttgcaattttggtaccaatttttttattggatgttttgggaagagcttcatcttcatcagtggagttgctgctttcgttgtcatctactttttggtcatcttctagttcgtcgtcgtctgaagcaatagcatctgaggaggacgcatctccttcacctttgaaaaagtaattaaataggtaggagtcatcatcagaatcttgttcttggtctgatcccctttcgacggcagcttgaatgttgtt

The invention claimed is:
 1. A Pichia pastoris microorganism, in whichthe activity of a YPS1-1 protease comprising a polypeptide sequence atleast 95% identical to SEQ ID NO: 67 and a YPS1-2 protease comprising apolypeptide sequence at least 95% identical to SEQ ID NO: 68 has beenattenuated or eliminated, wherein said polypeptide sequence at least 95%identical to SEQ ID NO: 67 and said polypeptide sequence at least 95%identical to SEQ ID NO: 68 each have a protease activity before saidattenuation or elimination, and wherein said microorganism expresses arecombinant protein.
 2. The microorganism of claim 1, wherein saidYPS1-1 protease comprises SEQ ID NO:
 67. 3. The microorganism of claim1, wherein said YPS1-1 protease is encoded by a YPS1-1 gene comprising apolynucleotide sequence at least 95% identical to SEQ ID NO: 1 andencoding a polypeptide having protease activity.
 4. The microorganism ofclaim 3, wherein said YPS1-1 gene comprises SEQ ID NO:
 1. 5. Themicroorganism of claim 1, wherein said YPS1-2 protease comprises SEQ IDNO:
 68. 6. The microorganism of claim 1, wherein said YPS1-2 protease isencoded by a YPS1-2 gene comprising a polynucleotide sequence at least95% identical to SEQ ID NO: 2 and encoding a polypeptide having proteaseactivity.
 7. The microorganism of claim 6, wherein said YPS1-2 genecomprises SEQ ID NO:
 2. 8. The microorganism of claim 1, wherein saidYPS1-1 protease is encoded by a YPS1-1 gene, wherein said YPS1-2protease is encoded by a YPS1-2 gene, and wherein said YPS1-1 gene orsaid YPS1-2 gene, or both, has been mutated or knocked out.
 9. Themicroorganism of claim 1, wherein said recombinant protein comprises oneor more repeat sequences {GGY-[GPG-X₁]n₁-GPS-(A)n₂}n₃ (SEQ ID NO: 514),whereinX1=SGGQQ (SEQ ID NO: 515), GAGQQ (SEQ ID NO: 516), GQGPY (SEQ ID NO:517), AGQQ (SEQ ID NO: 518) or SQ; n1 is from 4 to 8; n2 is from 6 to20; and n3 is from 2 to 20, wherein said one or more repeat sequencesare a silk-like polypeptide.
 10. The microorganism of claim 9, whereinsaid recombinant protein comprises SEQ ID NO:
 463. 11. The microorganismof claim 1, wherein the activity of one or more additional proteases hasbeen attenuated or eliminated.
 12. A Pichia pastoris engineeredmicroorganism comprising YPS1-1 and YPS1-2 activity reduced by amutation or deletion of the YPS1-1 gene comprising SEQ ID NO: 1 and theYPS1-2 gene comprising SEQ ID NO: 2, wherein said microorganism furthercomprises a recombinantly expressed protein comprising a polypeptidesequence comprising SEQ ID NO:
 463. 13. A cell culture comprising themicroorganism of claim
 1. 14. A cell culture comprising themicroorganism of claim 1, wherein said recombinantly expressed proteinis less degraded than a cell culture comprising an otherwise identicalPichia pastoris microorganism whose YPS1-1 and YPS1-2 activity has notbeen attenuated or eliminated.
 15. A method of producing a recombinantprotein with a reduced degradation, comprising: culturing themicroorganism of claim 1 in a culture medium under conditions suitablefor expression of the recombinantly expressed protein; and isolating therecombinant protein from the microorganism or the culture medium. 16.The method of claim 15, wherein said recombinant protein is secretedfrom said microorganism, and wherein isolating said recombinant proteincomprises collecting a culture medium comprising said secretedrecombinant protein.
 17. The method of claim 15, wherein saidrecombinant protein has a decreased level of degradation as compared tosaid recombinant protein produced by an otherwise identicalmicroorganism wherein said YPS1-1 and said YPS1-2 protease activity hasnot been attenuated or eliminated.
 18. A method of making the Pichiapastoris microorganism of claim 1 comprising knocking out or mutating agene encoding the YPS 1-1 protein and a gene encoding the YPS 1-2protein.
 19. The method of claim 18, wherein said recombinantlyexpressed protein comprises a polyA sequence comprising at least atleast 2, 3, 4, 5, 6, 7, 8, 9, or 10 contiguous alanine residues (SEQ IDNO: 519).
 20. The method of claim 18, wherein said recombinantlyexpressed protein comprises a silk-like polypeptide.
 21. The method ofclaim 20, wherein said silk-like polypeptide comprises one or morerepeat sequences {GGY-[GPG-X₁]n₁-GPS-(A)n₂}n₃ (SEQ ID NO: 514), whereinX₁=SGGQQ (SEQ ID NO: 515) or GAGQQ (SEQ ID NO: 516) or GQGPY (SEQ ID NO:517) or AGQQ (SEQ ID NO: 518) or SQ; n1 is from 4 to 8; n2 is from 6 to20; and n3 is from 2 to
 20. 22. The method of claim 18, wherein saidrecombinantly expressed protein comprises a polypeptide sequence encodedby SEQ ID NO:
 462. 23. A Pichia pastoris microorganism, in which theactivity of a YPS 1-1 protease comprising a polypeptide sequence atleast 95% identical to SEQ ID NO: 67 and a YPSI-2 protease comprising apolypeptide sequence at least 95% identical to SEQ ID NO: 68 has beenattenuated or eliminated, wherein said polypeptide sequence at least 95%identical to SEQ ID NO: 67 and said polypeptide sequence at least 95%identical to SEQ ID NO: 68 each have a protease activity before saidattenuation or elimination.